| # Results Index |
|
|
| This page is the quick index to generated evaluation outputs. |
|
|
| ## Community challenge eval |
|
|
| - Report (markdown): `docs/hf_hub_community_challenge_report.md` |
| - Report (json): `docs/hf_hub_community_challenge_report.json` |
| - Inputs: `scripts/hf_hub_community_challenges.txt` |
| - Generator: `scripts/score_hf_hub_community_challenges.py` |
|
|
| ## Community coverage eval |
|
|
| - Report (markdown): `docs/hf_hub_community_coverage_report.md` |
| - Report (json): `docs/hf_hub_community_coverage_report.json` |
| - Inputs: `scripts/hf_hub_community_coverage_prompts.json` |
| - Generator: `scripts/score_hf_hub_community_coverage.py` |
|
|
| ## Prompt/card A/B eval (community) |
|
|
| - Summary: |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.md` |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.json` |
| - `docs/hf_hub_prompt_ab/prompt_ab_summary.csv` |
| - Visuals (if matplotlib available): |
| - `docs/hf_hub_prompt_ab/prompt_ab_composite_<model>.png` |
| - `docs/hf_hub_prompt_ab/prompt_ab_scatter_tokens_vs_challenge.png` |
| - Generator: |
| - `scripts/eval_hf_hub_prompt_ab.py` |
|
|
| ## Tool routing eval |
|
|
| - Batch summary: |
| - `docs/tool_routing_eval/tool_routing_batch_summary.md` |
| - `docs/tool_routing_eval/tool_routing_batch_summary.json` |
| - `docs/tool_routing_eval/tool_routing_batch_summary.csv` |
| - Per-model reports: `docs/tool_routing_eval/tool_routing_*.md` (+ `.json`) |
| - Inputs: |
| - `scripts/tool_routing_challenges.txt` |
| - `scripts/tool_routing_expected.json` |
| - Generators: |
| - `scripts/score_tool_routing_confusion.py` |
| - `scripts/run_tool_routing_batch.py` |
|
|
| ## Tool description A/B eval |
|
|
| - Summary: |
| - `docs/tool_description_eval/tool_description_ab_summary.md` |
| - `docs/tool_description_eval/tool_description_ab_summary.json` |
| - `docs/tool_description_eval/tool_description_ab_summary.csv` |
| - Detailed/pairwise: |
| - `docs/tool_description_eval/tool_description_ab_detailed.json` |
| - `docs/tool_description_eval/tool_description_ab_pairwise.json` |
| - `docs/tool_description_eval/tool_description_ab_pairwise.csv` |
| - `docs/tool_description_eval/tool_description_ab_ranking.json` |
| - Visuals: |
| - `docs/tool_description_eval/heat_first_call_ok.png` |
| - `docs/tool_description_eval/heat_avg_score.png` |
| - `docs/tool_description_eval/heat_avg_calls.png` |
| - `docs/tool_description_eval/scatter_calls_vs_first_ok.png` |
| - `docs/tool_description_eval/tool_description_interpretation.md` |
| - Inputs: |
| - `scripts/hf_hub_community_challenges.txt` |
| - `scripts/tool_description_variants.json` |
| - Generators: |
| - `scripts/eval_tool_description_ab.py` |
| - `scripts/plot_tool_description_eval.py` |
|
|
| --- |
|
|
| ## One-command regeneration |
|
|
| ```bash |
| scripts/run_all_evals.sh |
| ``` |
|
|
| Optional environment overrides: |
|
|
| ```bash |
| MODELS=gpt-oss,gpt-5-mini ROUTER_AGENT=hf_hub_community scripts/run_all_evals.sh |
| ``` |
|
|