Instructions to use entropora/parakeet-ctc-110m-int8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use entropora/parakeet-ctc-110m-int8 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("entropora/parakeet-ctc-110m-int8") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Parakeet CTC 110M (INT8)
CTC-based speech recognition model for vocabulary-rescored transcription in Heydict.
Overview
This is the CTC decoder head of NVIDIA's parakeet-tdt_ctc-110m, exported to ONNX by csukuangfj/sherpa-onnx and dynamically quantized to INT8.
It runs as a companion model alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.
Files
| File | Size | Description |
|---|---|---|
encoder.int8.onnx |
126 MB | INT8 dynamically quantized CTC encoder |
encoder.fp32.onnx |
437 MB | Original FP32 encoder (for reference/GPU) |
tokens.txt |
10 KB | SentencePiece vocabulary (sherpa-onnx format) |
Architecture
- Encoder: FastConformer (17 layers, 256 dim, 4 heads)
- Decoder: CTC (encoder-only, no transducer joiner)
- Vocabulary: 1025 SentencePiece tokens
- Input: 128-dim log-mel spectrogram (NeMo convention)
- Output: Frame-level logits [1, T', 1025]
Quantization
Dynamic INT8 quantization via onnxruntime.quantization.quantize_dynamic. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.
License
CC-BY-4.0 (inherited from NVIDIA's original model)
- Downloads last month
- -
Model tree for entropora/parakeet-ctc-110m-int8
Base model
nvidia/parakeet-tdt_ctc-110m