Update Colab notebook: 1.5B model, scaled rewards, tuned hyperparameters ee8c2d4 Running nihalaninihal commited on 12 days ago
Align with Advanced Llama 3.2 GRPO LoRA reference notebook pattern c7d253a nihalaninihal Claude Opus 4.6 commited on 12 days ago
Fix format_comparison_metrics_html to accept run_comparison() dict directly d52b449 nihalaninihal Claude Opus 4.6 commited on 12 days ago
Align train.py and Colab notebook with official Unsloth+OpenEnv GRPO patterns e09a415 nihalaninihal Claude Opus 4.6 commited on 12 days ago
Update metrics format with drift/oversight tracking, add colab training notebook 5e0f2b1 nihalaninihal commited on 12 days ago