Improve model card metadata and description

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +58 -52
README.md CHANGED
@@ -1,4 +1,9 @@
1
  ---
 
 
 
 
 
2
  tags:
3
  - time-series-forecasting
4
  - foundation-models
@@ -9,64 +14,65 @@ tags:
9
  - observability
10
  - safetensors
11
  - pytorch_model_hub_mixin
12
- license: apache-2.0
13
- pipeline_tag: time-series-forecasting
14
  thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
15
  model-index:
16
  - name: Toto-2.0-313m
17
  results:
18
- - task:
19
- type: time-series-forecasting
20
- dataset:
21
- name: BOOM
22
- type: BOOM
23
- metrics:
24
- - name: CRPS
25
- type: CRPS
26
- value: 0.351
27
- - name: MASE
28
- type: MASE
29
- value: 0.585
30
- source:
31
- name: BOOM ๐Ÿ’ฅ Observability Time-Series Forecasting Leaderboard
32
- url: https://huggingface.co/spaces/Datadog/BOOM
33
- - task:
34
- type: time-series-forecasting
35
- dataset:
36
- name: GIFT-Eval
37
- type: GIFT-Eval
38
- metrics:
39
- - name: CRPS
40
- type: CRPS
41
- value: 0.481
42
- - name: MASE
43
- type: MASE
44
- value: 0.703
45
- source:
46
- name: GIFT-Eval Time Series Forecasting Leaderboard
47
- url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
48
- - task:
49
- type: time-series-forecasting
50
- dataset:
51
- name: TIME
52
- type: TIME
53
- metrics:
54
- - name: CRPS
55
- type: CRPS
56
- value: 0.535
57
- - name: MASE
58
- type: MASE
59
- value: 0.642
60
- source:
61
- name: TIME Benchmark Leaderboard
62
- url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
63
  ---
64
 
65
  # Toto-2.0-313m
66
 
67
- Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/). Toto 2.0 is the current generation, featuring u-ฮผP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family.
 
 
68
 
69
- The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
70
 
71
  ## ๐Ÿ“Š Performance
72
 
@@ -114,7 +120,7 @@ For more examples, see the [Quick Start notebook](https://github.com/DataDog/tot
114
 
115
  ## ๐Ÿ’พ Available Checkpoints
116
 
117
- All five Toto 2.0 sizes share the same training recipe; pick a size based on your accuracy/latency budget. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
118
 
119
  | Model | Params | Weights (fp32) | Latency | Recommended for |
120
  |:---:|:---:|:---:|---|---|
@@ -136,7 +142,7 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
136
 
137
  <figure>
138
  <img src="assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
139
- <figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="#-additional-resources">technical report</a> for details.</figcaption>
140
  </figure>
141
 
142
  ## ๐Ÿ”— Additional Resources
@@ -160,4 +166,4 @@ All five Toto 2.0 sizes share the same training recipe; pick a size based on you
160
  primaryClass={cs.LG},
161
  url={https://arxiv.org/abs/2605.20119},
162
  }
163
- ```
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: time-series-forecasting
4
+ library_name: pytorch
5
+ datasets:
6
+ - Datadog/BOOM
7
  tags:
8
  - time-series-forecasting
9
  - foundation-models
 
14
  - observability
15
  - safetensors
16
  - pytorch_model_hub_mixin
17
+ - u-mup
 
18
  thumbnail: https://web-assets.dd-static.net/42588/1778691695-toto-2-hero.png
19
  model-index:
20
  - name: Toto-2.0-313m
21
  results:
22
+ - task:
23
+ type: time-series-forecasting
24
+ dataset:
25
+ name: BOOM
26
+ type: BOOM
27
+ metrics:
28
+ - type: CRPS
29
+ value: 0.351
30
+ name: CRPS
31
+ - type: MASE
32
+ value: 0.585
33
+ name: MASE
34
+ source:
35
+ url: https://huggingface.co/spaces/Datadog/BOOM
36
+ name: BOOM ๐Ÿ’ฅ Observability Time-Series Forecasting Leaderboard
37
+ - task:
38
+ type: time-series-forecasting
39
+ dataset:
40
+ name: GIFT-Eval
41
+ type: GIFT-Eval
42
+ metrics:
43
+ - type: CRPS
44
+ value: 0.481
45
+ name: CRPS
46
+ - type: MASE
47
+ value: 0.703
48
+ name: MASE
49
+ source:
50
+ url: https://huggingface.co/spaces/Salesforce/GIFT-Eval
51
+ name: GIFT-Eval Time Series Forecasting Leaderboard
52
+ - task:
53
+ type: time-series-forecasting
54
+ dataset:
55
+ name: TIME
56
+ type: TIME
57
+ metrics:
58
+ - type: CRPS
59
+ value: 0.535
60
+ name: CRPS
61
+ - type: MASE
62
+ value: 0.642
63
+ name: MASE
64
+ source:
65
+ url: https://huggingface.co/spaces/Real-TSF/TIME-leaderboard
66
+ name: TIME Benchmark Leaderboard
67
  ---
68
 
69
  # Toto-2.0-313m
70
 
71
+ Toto (Time Series Optimized Transformer for [Observability](https://www.datadoghq.com/knowledge-center/observability/)) is a family of time series foundation models for multivariate forecasting developed by [Datadog](https://www.datadoghq.com/).
72
+
73
+ This model, **Toto-2.0-313m**, was presented in the paper [Toto 2.0: Time Series Forecasting Enters the Scaling Era](https://huggingface.co/papers/2605.20119) by Emaad Khwaja, Chris Lettieri, Gerald Woo, Eden Belouadah, Marc Cenac, Guillaume Jarry, Enguerrand Paquin, Xunyi Zhao, Viktoriya Zhukov, Othmane Abou-Amal, Chenghao Liu, Ameet Talwalkar, and David Asker.
74
 
75
+ Toto 2.0 is a generation of u-ฮผP-scaled transformers ranging from 4m to 2.5B parameters, all trained from a single recipe. Forecast quality improves reliably with parameter count across the family. The family sets a new state of the art on three forecasting benchmarks: [BOOM](https://huggingface.co/spaces/Datadog/BOOM), our observability benchmark; [GIFT-Eval](https://huggingface.co/spaces/Salesforce/GIFT-Eval), the standard general-purpose benchmark; and the recent contamination-resistant [TIME](https://arxiv.org/abs/2602.12147) benchmark.
76
 
77
  ## ๐Ÿ“Š Performance
78
 
 
120
 
121
  ## ๐Ÿ’พ Available Checkpoints
122
 
123
+ All five Toto 2.0 sizes share the same training recipe. Latency is forward-pass time for a 1,024-step single-pass forecast at batch size 8 on a single A100.
124
 
125
  | Model | Params | Weights (fp32) | Latency | Recommended for |
126
  |:---:|:---:|:---:|---|---|
 
142
 
143
  <figure>
144
  <img src="assets/architecture.png" alt="Overview of the Toto 2.0 architecture.">
145
+ <figcaption>A decoder-only patched transformer whose attention layers alternate between time-axis (causal) and variate-axis (full) views of the input. Toto 2.0 adds <b>contiguous patch masking (CPM)</b> for single-pass parallel decoding, a <b>quantile output head</b> trained with pinball loss, a robust arcsinh input scaler, residual MLP patch projections, and is trained with NorMuon. See the <a href="https://arxiv.org/abs/2605.20119">technical report</a> for details.</figcaption>
146
  </figure>
147
 
148
  ## ๐Ÿ”— Additional Resources
 
166
  primaryClass={cs.LG},
167
  url={https://arxiv.org/abs/2605.20119},
168
  }
169
+ ```