-
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency
Paper • 2505.01658 • Published • 39 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Reward Reasoning Model
Paper • 2505.14674 • Published • 38
Collections
Discover the best community collections!
Collections including paper arxiv:2505.01658
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 14 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 16 -
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency
Paper • 2505.01658 • Published • 39 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 15 -
Reward Reasoning Model
Paper • 2505.14674 • Published • 38
-
CoRAG: Collaborative Retrieval-Augmented Generation
Paper • 2504.01883 • Published • 9 -
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Paper • 2504.08837 • Published • 43 -
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Paper • 2504.10068 • Published • 30 -
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper • 2504.10481 • Published • 85
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 43 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 16 -
Attention Is All You Need
Paper • 1706.03762 • Published • 105 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 62 -
Zero-Shot Tokenizer Transfer
Paper • 2405.07883 • Published • 5
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 23 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 14 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69