v6: GRPO with --resume-adapter to start from SFT checkpoint 59ed7f3 verified piyush-mk commited on Apr 25
v5c: add --min-investigation-steps filter for submit-only with longer contexts 2857a20 verified piyush-mk commited on Apr 25
v5: remove 4-bit quant, variable-length traces, submit oversampling, GRPO exploration fixes 1d75102 verified piyush-mk commited on Apr 25
Fix: decode with skip_special_tokens=False to preserve <think> tags for stripping 231d76f verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: training/launch_hf_job.py befff2d verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: training/eval_adapter.py 664fa8e verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: training/train_sft.py 7c48dcf verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: training/train_grpo.py 9d40fed verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: training/rollout.py f788adf verified piyush-mk commited on Apr 25
Fix Qwen3 thinking mode + increase max_new_tokens: inference.py 4afdc43 verified piyush-mk commited on Apr 25