Distil-Whisper: Optimized for Qualcomm Devices
Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.
This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.
Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.
Getting Started
There are two ways to deploy this model on your device:
Option 1: Download Pre-Exported Models
Below are pre-exported model assets ready for deployment.
| Runtime | Precision | Chipset | SDK Versions | Download |
|---|---|---|---|---|
| ONNX | float | Universal | QAIRT 2.42, ONNX Runtime 1.24.3 | Download |
| QNN_DLC | float | Universal | QAIRT 2.43 | Download |
| TFLITE | float | Universal | QAIRT 2.43, TFLite 2.19.1 | Download |
For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.
Option 2: Export with Custom Configurations
Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:
- Custom weights (e.g., fine-tuned checkpoints)
- Custom input shapes
- Target device and runtime configurations
This option is ideal if you need to customize the model beyond the default configuration provided here.
See our repository for Distil-Whisper on GitHub for usage instructions.
Model Details
Model Type: Model_use_case.speech_recognition
Model Stats:
- Model checkpoint: distil-whisper/distil-small.en
- Input resolution: 80x3000 (30 seconds audio)
- Max decoded sequence length: 200 tokens
- Number of parameters (encoder): 166M
- Model size (encoder) (float): 332 MB
- Number of parameters (decoder): 211M
- Model size (decoder) (float): 450MB
Performance Summary
| Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit |
|---|---|---|---|---|---|---|
| decoder | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.544 ms | 51 - 375 MB | NPU |
| decoder | ONNX | float | Snapdragon® X2 Elite | 5.102 ms | 178 - 178 MB | NPU |
| decoder | ONNX | float | Snapdragon® X Elite | 11.168 ms | 178 - 178 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 8.615 ms | 0 - 364 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS8550 (Proxy) | 11.602 ms | 0 - 195 MB | NPU |
| decoder | ONNX | float | Qualcomm® QCS9075 | 13.163 ms | 40 - 82 MB | NPU |
| decoder | ONNX | float | Snapdragon® 8 Elite For Galaxy Mobile | 7.186 ms | 16 - 476 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.544 ms | 17 - 326 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® X2 Elite | 5.615 ms | 40 - 40 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® X Elite | 10.866 ms | 40 - 40 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 8.59 ms | 17 - 330 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8275 (Proxy) | 18.893 ms | 30 - 246 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 11.254 ms | 40 - 42 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® SA8775P | 12.908 ms | 19 - 236 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS9075 | 13.082 ms | 40 - 86 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® QCS8450 (Proxy) | 18.076 ms | 36 - 345 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® SA7255P | 18.893 ms | 30 - 246 MB | NPU |
| decoder | QNN_DLC | float | Qualcomm® SA8295P | 14.217 ms | 20 - 271 MB | NPU |
| decoder | QNN_DLC | float | Snapdragon® 8 Elite For Galaxy Mobile | 7.215 ms | 5 - 447 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 5.627 ms | 4 - 480 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 8.471 ms | 4 - 556 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8275 (Proxy) | 18.822 ms | 5 - 450 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 11.489 ms | 5 - 7 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA8775P | 12.895 ms | 5 - 450 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS9075 | 13.074 ms | 2 - 266 MB | NPU |
| decoder | TFLITE | float | Qualcomm® QCS8450 (Proxy) | 18.592 ms | 5 - 524 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA7255P | 18.822 ms | 5 - 450 MB | NPU |
| decoder | TFLITE | float | Qualcomm® SA8295P | 14.156 ms | 5 - 443 MB | NPU |
| decoder | TFLITE | float | Snapdragon® 8 Elite For Galaxy Mobile | 6.994 ms | 3 - 504 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 50.657 ms | 77 - 836 MB | NPU |
| encoder | ONNX | float | Snapdragon® X2 Elite | 50.82 ms | 183 - 183 MB | NPU |
| encoder | ONNX | float | Snapdragon® X Elite | 124.045 ms | 182 - 182 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 82.002 ms | 86 - 1245 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS8550 (Proxy) | 118.492 ms | 0 - 197 MB | NPU |
| encoder | ONNX | float | Qualcomm® QCS9075 | 150.326 ms | 79 - 83 MB | NPU |
| encoder | ONNX | float | Snapdragon® 8 Elite For Galaxy Mobile | 60.397 ms | 81 - 774 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 49.034 ms | 0 - 742 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® X2 Elite | 48.393 ms | 1 - 1 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® X Elite | 121.663 ms | 1 - 1 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 82.41 ms | 0 - 1046 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8275 (Proxy) | 397.445 ms | 1 - 731 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 118.837 ms | 1 - 3 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® SA8775P | 139.846 ms | 1 - 733 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS9075 | 152.427 ms | 1 - 39 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® QCS8450 (Proxy) | 276.295 ms | 1 - 917 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® SA7255P | 397.445 ms | 1 - 731 MB | NPU |
| encoder | QNN_DLC | float | Qualcomm® SA8295P | 207.259 ms | 1 - 668 MB | NPU |
| encoder | QNN_DLC | float | Snapdragon® 8 Elite For Galaxy Mobile | 60.835 ms | 1 - 665 MB | NPU |
| encoder | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 419.256 ms | 41 - 81 MB | GPU |
| encoder | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 479.748 ms | 0 - 144 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8275 (Proxy) | 3116.266 ms | 38 - 82 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 655.878 ms | 0 - 313 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8775P | 1326.077 ms | 27 - 71 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS9075 | 1270.615 ms | 0 - 40 MB | GPU |
| encoder | TFLITE | float | Qualcomm® QCS8450 (Proxy) | 843.956 ms | 39 - 193 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA7255P | 3116.266 ms | 38 - 82 MB | GPU |
| encoder | TFLITE | float | Qualcomm® SA8295P | 667.309 ms | 40 - 83 MB | GPU |
| encoder | TFLITE | float | Snapdragon® 8 Elite For Galaxy Mobile | 409.596 ms | 41 - 80 MB | GPU |
License
- The license for the original implementation of Distil-Whisper can be found here.
References
- Distil-Whisper - Robust Knowledge Distillation via Large-Scale Pseudo Labelling
- Source Model Implementation
Community
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.
