Update README.md
Browse files
README.md
CHANGED
|
@@ -17,10 +17,14 @@ This model demonstrates the efficiency of **State Space Models (SSM)** on morpho
|
|
| 17 |
|
| 18 |
## Benchmark Results 🏆
|
| 19 |
|
| 20 |
-
| Model Architecture | Throughput (tok/s) | Latency (ms) | Peak VRAM (MB) |
|
| 21 |
-
| :--- | :--- | :--- | :--- |
|
| 22 |
-
| Transformer (GPT-2) | 67.53 | 14.81 | ~1786 |
|
| 23 |
-
| **Mamba (Ours)** | **131.09** | **7.63** | ~2469* |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
*> Note: VRAM usage for Mamba includes CUDA Graph overhead for maximum throughput.*
|
| 26 |
|
|
|
|
| 17 |
|
| 18 |
## Benchmark Results 🏆
|
| 19 |
|
| 20 |
+
| Model Architecture | Throughput (tok/s) | Latency (ms) | Peak VRAM (MB) | Final Loss (500 Steps) |
|
| 21 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 22 |
+
| Transformer (GPT-2) | 67.53 | 14.81 | ~1786 |6.81 |
|
| 23 |
+
| **Mamba (Ours)** | **131.09** | **7.63** | ~2469* | 20.58 |
|
| 24 |
+
|
| 25 |
+
Model,Throughput (tok/s),Latency (ms/token),Final Loss (500 Steps)
|
| 26 |
+
Transformer (Baseline),67.53,14.81,6.81
|
| 27 |
+
Mamba (SSM),131.09,7.63,20.58
|
| 28 |
|
| 29 |
*> Note: VRAM usage for Mamba includes CUDA Graph overhead for maximum throughput.*
|
| 30 |
|