-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259
Collections
Discover the best community collections!
Collections including paper arxiv:2505.12781
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23