Commit History

Add blog and final data
18d1157
Running

bpHigh commited on

GRPO Phase 13: custom rollout_func for markdown JSON tool calls
f6d4692

bpHigh Claude Opus 4.7 (1M context) commited on

SFT eval on 22-task held-out split β€” fill in leaderboard
2e1dd84

bpHigh Claude Opus 4.7 (1M context) commited on

Move SUPPORTS_CONCURRENT_SESSIONS from module-level to class attribute
90a25f6

bpHigh Claude Opus 4.7 (1M context) commited on

Fix client._parse_result to unwrap {observation,reward,done} payload
2d7510b

bpHigh Claude Opus 4.7 (1M context) commited on

Enable concurrent sessions in env for GRPO training
99c16d0

bpHigh Claude Opus 4.7 (1M context) commited on

Attribute hand-curated Round-1 tasks to Finch + list ALL 119 tasks in openenv.yaml
8d80d79

bpHigh Claude Opus 4.7 (1M context) commited on

Update openenv.yaml β€” full 119-task inventory + 32 enumerated entries
4d2df85

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 11.7: interactive Prev/Next/Play replay (was static wall of HTML)
d2310e1

bpHigh Claude Opus 4.7 (1M context) commited on

Add raw_logs.txt + HF Job + adapter links to dashboard and README
f2e02e4

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 11.6: Kimi-K2.5 best-run replays in dashboard
3e65e46

bpHigh Claude Opus 4.7 (1M context) commited on

Fix blank Space iframe β€” base_path needs trailing slash for Gradio mount
60877e2

bpHigh Claude Opus 4.7 (1M context) commited on

Fix blank Gradio iframe β€” set root_path='/dashboard' on mount
05b7358

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 11.5: Gradio dashboard at /dashboard (now the Space's base_path)
ae0420a

bpHigh Claude Opus 4.7 (1M context) commited on

eval_lora: fix truncation drop-direction bug + add subprocess preflight
b1c7959

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 11: eval_lora.py β€” in-process SFT eval (no API, no WebSocket)
15e45dc

bpHigh Claude Opus 4.7 (1M context) commited on

SFT run #2: 8K-context Qwen2.5-Coder-3B (qwen3b-office-sft-kimi-long)
301eb21

bpHigh Claude Opus 4.7 (1M context) commited on

Track PNGs via LFS so HF Space accepts the SFT plot
364791a

bpHigh commited on

Phase 10.1: SFT log analyzer + Qwen2.5-Coder-3B training artifacts
85f4b5e

bpHigh Claude Opus 4.7 (1M context) commited on

Add SFT corpus + Kimi-K2.5 teacher run + Kimi eval run
c7178dc

bpHigh Claude Opus 4.7 (1M context) commited on

train_sft: drop fp16, prefer bf16 (MPS-compatible without grad scaler)
35fa944

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 10: SFT training script (Qwen2.5-Coder-3B + LoRA via TRL)
c803cd5

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 9.1: --skip-completed flag for cheap re-runs
1ce8fac

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 9: hard early-submit gate at env layer (kills the exploit class)
9033aad

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 8: SFT corpus builder with 6-filter pipeline
d78f879

bpHigh Claude Opus 4.7 (1M context) commited on

Parse Kimi K2/K2.5 native tool-call format in inference.py
3db1e6a

bpHigh Claude Opus 4.7 (1M context) commited on

Phase 7: close the 'submit source unchanged' exploit Kimi-K2.5 found
4688533

bpHigh Claude Opus 4.7 (1M context) commited on

Round 2 README, Qwen2.5-Coder-3B baseline, missing data_pipeline pullers
4d300ac

bpHigh Claude Opus 4.7 (1M context) commited on

Add graders
e13057d

bpHigh commited on

Add extended arena stuff
a57d682

bpHigh commited on

Update readme
30d11c3

bpHigh commited on

Graduated code step rewards based on execution success and code substance
b448320

bpHigh commited on

Enable web interface on HF Space
cc6ad5c

bpHigh commited on

add finch attribution
a211674

bpHigh commited on

Financial Task Environment β€” code execution with real xlsx
cd4b800

bpHigh commited on

Initial commit
adbdd47
unverified

Bhavish Pahwa commited on