docs: update reward signal documentation with structured tables and weight breakdowns c829526 Navigam commited on Apr 26
feat: add SFT loss comparison visualization and training notebook for Qwen models 229a9b5 Navigam commited on Apr 26
feat: add T4-optimized SFT and RLVR training scripts, evaluation utilities, and updated documentation 58916ea Navigam commited on Apr 26
docs: update README documentation, add architecture diagram, and include a new training notebook. 251b189 Navigam commited on Apr 26
feat: enhance training environment and documentation for CORP-ENV models abaaa50 Navigam commited on Apr 26