Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion Paper • 2601.13599 • Published 9 days ago • 4
Diffusion In Diffusion: Reclaiming Global Coherence in Semi-Autoregressive Diffusion Paper • 2601.13599 • Published 9 days ago • 4
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published Dec 16, 2025 • 14
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse Paper • 2512.14531 • Published Dec 16, 2025 • 14
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 30
Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation Paper • 2509.26497 • Published Sep 30, 2025
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published Nov 25, 2025 • 43
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published Nov 25, 2025 • 43
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published Nov 25, 2025 • 43 • 5
Benchmarking Optimizers for Large Language Model Pretraining Paper • 2509.01440 • Published Sep 1, 2025 • 25
FreedomIntelligence/openPangu-Embedded-1B Text Generation • 1B • Updated Aug 21, 2025 • 66 • 6
FreedomIntelligence/openPangu-Embedded-7B Text Generation • 8B • Updated Aug 21, 2025 • 120 • 6
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 30
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models Paper • 2403.00818 • Published Feb 26, 2024 • 19