Parakeet CTC 110M (INT8)

CTC-based speech recognition model for vocabulary-rescored transcription in Heydict.

Overview

This is the CTC decoder head of NVIDIA's parakeet-tdt_ctc-110m, exported to ONNX by csukuangfj/sherpa-onnx and dynamically quantized to INT8.

It runs as a companion model alongside the primary Parakeet TDT transducer. The CTC model's frame-level logits are rescored against the user's custom vocabulary list (domain terms, company names, technical jargon) to improve recognition accuracy for specialized terms.

Files

File	Size	Description
`encoder.int8.onnx`	126 MB	INT8 dynamically quantized CTC encoder
`encoder.fp32.onnx`	437 MB	Original FP32 encoder (for reference/GPU)
`tokens.txt`	10 KB	SentencePiece vocabulary (sherpa-onnx format)

Architecture

Encoder: FastConformer (17 layers, 256 dim, 4 heads)
Decoder: CTC (encoder-only, no transducer joiner)
Vocabulary: 1025 SentencePiece tokens
Input: 128-dim log-mel spectrogram (NeMo convention)
Output: Frame-level logits [1, T', 1025]

Quantization

Dynamic INT8 quantization via onnxruntime.quantization.quantize_dynamic. Weights are INT8, activations are quantized at runtime. ~3.5x smaller than FP32 with minimal accuracy loss — suitable for a companion rescoring model.

License

CC-BY-4.0 (inherited from NVIDIA's original model)

Downloads last month: -

Model tree for entropora/parakeet-ctc-110m-int8

Base model

nvidia/parakeet-tdt_ctc-110m

Quantized

(15)

this model