UFT: Unifying Supervised and Reinforcement Fine-Tuning
-
UFT: Unifying Supervised and Reinforcement Fine-Tuning
Paper • 2505.16984 • Published • 3 -
liumy2010/Llama-3.2-1B-countdown-R3
Text Generation • 1B • Updated • 13 -
liumy2010/Llama-3.2-1B-countdown-RFT
Text Generation • 1B • Updated • 9 -
liumy2010/Llama-3.2-1B-countdown-SFT
Text Generation • 1B • Updated • 9