GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 25 days ago • 108
Running 3.86k The Ultra-Scale Playbook 🌌 3.86k The ultimate guide to training LLM on large GPU Clusters
MatFormer: Nested Transformer for Elastic Inference Paper • 2310.07707 • Published Oct 11, 2023 • 5
To read... eventually Collection A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics. • 169 items • Updated Jun 30, 2025 • 6
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 103
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 261
Qwen-Scope: Turning Sparse Features into Development Tools for Large Language Models Paper • 2605.11887 • Published 12 days ago • 9
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 60
keithtyser/Gemopus-4-26B-A4B-it-local-abliterated-sota-internal-r7-selected-t34-transfer Image-Text-to-Text • 26B • Updated 21 days ago • 22 • 1