SenseVoiceSmall ONNX (INT8 Quantized)

This is a mirror of iic/SenseVoiceSmall-onnx from ModelScope, redistributed here for convenient access via HuggingFace.

Model Description

SenseVoiceSmall is a multilingual speech understanding model from Alibaba DAMO Academy (FunAudioLLM), supporting:

  • Automatic Speech Recognition (ASR) for Chinese, English, Cantonese, Japanese, and Korean
  • Speech Emotion Recognition (SER)
  • Audio Event Detection (AED)
  • Inverse Text Normalization (ITN)

This repository contains the INT8 quantized ONNX export (~230 MB), suitable for efficient CPU inference via onnxruntime.

Files

File Size Description
model_quant.onnx ~230 MB INT8 quantized ONNX model
am.mvn 11 KB CMVN (mean/variance normalization) statistics
config.yaml 1.8 KB Model & frontend configuration
tokens.json 344 KB Token vocabulary
configuration.json 56 B Framework metadata

Usage

This model is designed to be used with onnxruntime + kaldi-native-fbank + sentencepiece for lightweight inference without the full FunASR/PyTorch stack.

Source & License

Citation

@inproceedings{an2024funaudiollm,
  author={An, Keyu and others},
  title={FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs},
  year={2024},
}
Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support