Update README.md
Browse files
README.md
CHANGED
|
@@ -102,6 +102,8 @@ We measured the average inference speed (tokens/s) of generating 1024 new tokens
|
|
| 102 |
|BF16 | 33.40 | 31.91 | 21.33|
|
| 103 |
|INT4 | - | 31.95 | - |
|
| 104 |
|
|
|
|
|
|
|
| 105 |
|
| 106 |
## 🚀 How to use the model
|
| 107 |
|
|
|
|
| 102 |
|BF16 | 33.40 | 31.91 | 21.33|
|
| 103 |
|INT4 | - | 31.95 | - |
|
| 104 |
|
| 105 |
+
The profiling runs on a single A800-SXM4-80G GPU with PyTorch 2.4.0 and CUDA 12.1.
|
| 106 |
+
|
| 107 |
|
| 108 |
## 🚀 How to use the model
|
| 109 |
|