Spaces:
Running
Running
Commit History
GRPO Phase 13: custom rollout_func for markdown JSON tool calls f6d4692
SFT eval on 22-task held-out split β fill in leaderboard 2e1dd84
Move SUPPORTS_CONCURRENT_SESSIONS from module-level to class attribute 90a25f6
Fix client._parse_result to unwrap {observation,reward,done} payload 2d7510b
Enable concurrent sessions in env for GRPO training 99c16d0
Attribute hand-curated Round-1 tasks to Finch + list ALL 119 tasks in openenv.yaml 8d80d79
Update openenv.yaml β full 119-task inventory + 32 enumerated entries 4d2df85
Phase 11.7: interactive Prev/Next/Play replay (was static wall of HTML) d2310e1
Add raw_logs.txt + HF Job + adapter links to dashboard and README f2e02e4
Phase 11.6: Kimi-K2.5 best-run replays in dashboard 3e65e46
Fix blank Space iframe β base_path needs trailing slash for Gradio mount 60877e2
Fix blank Gradio iframe β set root_path='/dashboard' on mount 05b7358
Phase 11.5: Gradio dashboard at /dashboard (now the Space's base_path) ae0420a
eval_lora: fix truncation drop-direction bug + add subprocess preflight b1c7959
Phase 11: eval_lora.py β in-process SFT eval (no API, no WebSocket) 15e45dc
SFT run #2: 8K-context Qwen2.5-Coder-3B (qwen3b-office-sft-kimi-long) 301eb21
Track PNGs via LFS so HF Space accepts the SFT plot 364791a
Phase 10.1: SFT log analyzer + Qwen2.5-Coder-3B training artifacts 85f4b5e
Add SFT corpus + Kimi-K2.5 teacher run + Kimi eval run c7178dc
train_sft: drop fp16, prefer bf16 (MPS-compatible without grad scaler) 35fa944
Phase 10: SFT training script (Qwen2.5-Coder-3B + LoRA via TRL) c803cd5
Phase 9.1: --skip-completed flag for cheap re-runs 1ce8fac
Phase 9: hard early-submit gate at env layer (kills the exploit class) 9033aad
Phase 8: SFT corpus builder with 6-filter pipeline d78f879
Parse Kimi K2/K2.5 native tool-call format in inference.py 3db1e6a
Phase 7: close the 'submit source unchanged' exploit Kimi-K2.5 found 4688533
Round 2 README, Qwen2.5-Coder-3B baseline, missing data_pipeline pullers 4d300ac
Add graders e13057d
Add extended arena stuff a57d682
Update readme 30d11c3
Graduated code step rewards based on execution success and code substance b448320
Enable web interface on HF Space cc6ad5c
add finch attribution a211674
Financial Task Environment β code execution with real xlsx cd4b800
Initial commit adbdd47 unverified
Bhavish Pahwa commited on