fix: OOM — NUM_GENERATIONS 32→16, max_completion_length 300→200, expandable_segments c0d3d54 verified Pathikreet commited on Apr 26
fix: add 3 missing hard tasks to _TASK_DIFFICULTY (322 prompts) a6c22c8 verified Pathikreet commited on Apr 26
Fix: kl_coeff -> beta (correct TRL GRPOConfig param name) a47b370 verified Pathikreet commited on Apr 26
Auto-detect username from token for adapter + run folder upload 0f17c96 verified Pathikreet commited on Apr 26
Fix colab: format reward +-0.15, temp=0.7, kl_coeff=0.1 dc41c9d verified Pathikreet commited on Apr 26
Fix: start_training yield count (9), plt.close memory leak 83e0e06 verified Pathikreet commited on Apr 26
Fix restart loop: variant=secondary, theme back in Blocks() 49d99dc verified Pathikreet commited on Apr 26
Stop button: write flag file, wait 120s for clean save, fallback terminate 3eb85d0 verified Pathikreet commited on Apr 26
Graceful stop: save weights on /app/stop_requested flag 65ac9f8 verified Pathikreet commited on Apr 26
Add Stop button + save plots on stop; fix _refresh unpack bug; fix theme deprecation b36df55 verified Pathikreet commited on Apr 26
Fix hard_currency_conversion task ID in TRAIN_TASKS and EVAL_TASKS e27253e verified Pathikreet commited on Apr 26
Baseline: Qwen2.5-7B-Instruct untrained 2026-04-26_0443 c10ea23 verified Pathikreet commited on Apr 26
Bump seeds: medium×8, hard/long×20 (322 prompts total) abf8676 verified Pathikreet commited on Apr 26
Add format/difficulty/ep-length live panels + full metrics in JSON 1744e09 verified Pathikreet commited on Apr 26
Run 3: temp=0.7, kl=0.1, format±0.15, no curriculum, 20 tasks, G=32 0bfa536 verified Pathikreet commited on Apr 26
Update UI defaults: epochs 3→6 (max 10), generations 8→16 (max 32) d6d586a verified Pathikreet commited on Apr 25
Baseline: Qwen2.5-7B-Instruct untrained 2026-04-25_1626 e7727fc verified Pathikreet commited on Apr 25
Update root eval_baseline.py to 17 tasks + long-horizon + health retry 951f2d1 verified Pathikreet commited on Apr 25
Restore original requirements: gradio + torch ML stack 0cf30cc verified Pathikreet commited on Apr 25
Fix requirements: restore gradio + UI deps for training space ea3f11d verified Pathikreet commited on Apr 25