Running 3.67k The Ultra-Scale Playbook 🌌 3.67k The ultimate guide to training LLM on large GPU Clusters
DivMerge: A divergence-based model merging method for multi-tasking Paper • 2509.02108 • Published Sep 2, 2025 • 26