Bundle Nemotron Ultra should-refuse rows f9fb00d Running verified VibeCodingScientist commited on 6 days ago
Add Nemotron 3 Ultra 550B (post-v1.1, rotated v1.3 council) 4e06b0a verified VibeCodingScientist commited on 6 days ago
Add MiniMax M3 (post-v1.1, rotated v1.3 council) 9b03ccf verified VibeCodingScientist commited on 9 days ago
Add Claude Opus 4.8 (post-v1.1, rotated v1.3 council) to leaderboard + longitudinal 68b5682 verified VibeCodingScientist commited on 14 days ago
Leaderboard: add Youden's J column (default sort), per-tier directional sort + glyphs 718a26e verified VibeCodingScientist commited on 18 days ago
Bundle should-refuse sweep data for Calibration tab b3d466f verified VibeCodingScientist commited on 21 days ago
Add Calibration tab: PC-tier scatter + TPR bars from should-refuse sweep 240e3ec verified VibeCodingScientist commited on 21 days ago
Redesign UI: theme-aware leaderboard, thesis-forward hero, cleaner cells 5eaec60 verified VibeCodingScientist commited on 21 days ago
Redesign leaderboard: two-row header, heatmap tints, progress bars, rank column, intro blurb 49bc134 verified VibeCodingScientist commited on 21 days ago
Fix sdk_version to 5.50.0 (exact release required by HF) 09df299 verified VibeCodingScientist commited on 21 days ago
Deploy RefusalBench leaderboard (v1.1-frozen, arXiv:2605.21545) 3b68594 verified VibeCodingScientist commited on 21 days ago
Deploy RefusalBench leaderboard (v1.1-frozen, arXiv:2605.21545) ab29e65 verified VibeCodingScientist commited on 21 days ago