Spaces:
Sleeping
Sleeping
| title: TemporalBench Leaderboard | |
| emoji: 🥇 | |
| colorFrom: green | |
| colorTo: indigo | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: true | |
| license: apache-2.0 | |
| short_description: Read-only TemporalBench leaderboard for offline results. | |
| sdk_version: 5.49.1 | |
| tags: | |
| - leaderboard | |
| # TemporalBench Leaderboard | |
| This Space is a read-only visualization and validation layer for **offline** TemporalBench results. | |
| It does not execute agents, call LLM APIs, or accept API keys. | |
| ## Configuration | |
| - Set the local results file path via `TEMPORALBENCH_RESULTS_PATH`. | |
| Default is `data/results.json`. | |
| - Submissions are stored in `data/submissions/` for manual review (override with `TEMPORALBENCH_SUBMISSIONS_PATH`). | |
| - Update descriptive text in `src/about.py`. | |
| ## Results File Format | |
| Results must be a JSON list or CSV table, where each record is one agent configuration. | |
| Required fields per record: | |
| ```json | |
| { | |
| "model_name": "string", | |
| "agent_name": "string", | |
| "agent_type": "string", | |
| "base_model": "string", | |
| "T1_acc": 0.0, | |
| "T2_acc": 0.0, | |
| "T3_acc": 0.0, | |
| "T4_acc": 0.0, | |
| "T2_sMAPE": 0.0, | |
| "T2_MAE": 0.0, | |
| "T4_sMAPE": 0.0, | |
| "T4_MAE": 0.0, | |
| "FreshRetailNet_T2_sMAPE": 0.0, | |
| "FreshRetailNet_T2_MAE": 0.0, | |
| "MIMIC_T2_OW_sMAPE": 0.0, | |
| "MIMIC_T2_OW_RMSSE": 0.0 | |
| } | |
| ``` | |
| Notes: | |
| - `T2_sMAPE`, `T2_MAE`, `T4_sMAPE`, `T4_MAE` are optional (forecasting metrics). | |
| - Dataset-level columns are optional and displayed if present. | |
| - For MIMIC forecasting, only `OW_sMAPE` and `OW_RMSSE` are expected. | |
| - Any additional numeric columns are treated as optional domain metrics and will be shown. | |
| - Records must have a consistent schema and numeric metric values. | |
| ## Project Structure | |
| - `app.py`: Gradio UI + leaderboard rendering | |
| - `src/leaderboard/load_results.py`: Load + validate results | |
| - `src/leaderboard/schema.py`: Identity/metric field definitions | |
| - `src/about.py`: Text and descriptions | |
| - `src/display/css_html_js.py`: Custom styling | |