🧬 Carbon Collection Carbon 500M, 3B, 8B genomic models and GGUF variants for llama.cpp • 7 items • Updated 10 days ago • 43
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 327
Rethink_SFT_generalization Collection Repo for paper Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability. • 40 items • Updated Apr 11 • 21
Running 183 The ultimate guide to RL environments: building and scaling them in the LLM era 📝 183 Building and scaling RL environments for LLM training
unsloth-grpo-tests Collection test runs for unsloth grpo training -- math use case • 6 items • Updated Apr 13