You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Vietnamese Telephony ASR Portable Bundle

This directory is a self-contained inference bundle for long-form Vietnamese telephony speech recognition. It includes the runtime code, the exported ASR checkpoint, the punctuation/casing restoration checkpoint, and the Ocany domain hotwords file required for decoding.

The bundle is intended for deployment or handoff to another machine with minimal setup.

What Is Included

  • Vietnamese telephony ASR model checkpoint: outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final
  • Tourmii punctuation and casing restoration checkpoint: tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final
  • Minimal inference scripts: run_infer.sh, scripts/run_longform_infer.py, scripts/infer_ctc.py
  • Supporting runtime package: src/asr_vi_wav2vec2_telephony/
  • Domain hotwords: configs/ocany_hotwords.txt

Directory Layout

asr_vi_wav2vec2_telephony_portable/
β”œβ”€β”€ configs/
β”‚   └── ocany_hotwords.txt
β”œβ”€β”€ outputs/
β”‚   └── results64k_wav2vec2_cleaned_v2_telaug30_longer/
β”‚       └── final/
β”œβ”€β”€ requirements.infer.txt
β”œβ”€β”€ run_infer.sh
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ infer_ctc.py
β”‚   └── run_longform_infer.py
β”œβ”€β”€ src/
β”‚   └── asr_vi_wav2vec2_telephony/
└── tourmii_pnc_finetune/
    └── outputs/
        └── tourmii_pnc_domain_v1/
            └── final/

System Requirements

  • Python 3.10 or newer
  • ffmpeg available in the system PATH
  • Linux or another environment that can run bash
  • Optional GPU support: install a torch and torchaudio build compatible with the target machine's CUDA runtime

Setup

Create a fresh virtual environment inside the bundle and install the inference dependencies:

cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt

If you plan to run on GPU, make sure the installed torch and torchaudio packages match the target machine's CUDA version. Do not assume a virtual environment created on one machine will work correctly on another.

Quick Start

The shortest command is:

cd /path/to/asr_vi_wav2vec2_telephony_portable
bash run_infer.sh /path/to/audio.wav

You can also explicitly choose the Python interpreter:

cd /path/to/asr_vi_wav2vec2_telephony_portable
PYTHON_BIN=/usr/bin/python3 bash run_infer.sh /path/to/audio.wav

Default Runtime Behavior

By default, the bundle will:

  • load the ASR model from outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final
  • enable beam-search decoding
  • load hotwords from configs/ocany_hotwords.txt
  • use --hotword-weight 4
  • enable punctuation and casing restoration
  • load the local Tourmii PnC model from tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final
  • write outputs to tmp/<audio_stem>_guard_test

For an input file named call_2332.wav, the output directory will be:

tmp/call_2332_guard_test

The output bundle typically contains:

  • result.json: structured inference output and metadata
  • result.txt: best final text output
  • result.png: waveform visualization with transcript context
  • <audio_stem>.prepared.wav: converted and normalized working audio

Advanced Usage

Use the Python entry point if you need to override defaults:

cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 scripts/run_longform_infer.py \
  --audio /data/call_2332.wav \
  --output-dir ./tmp/custom_out \
  --hotword-weight 2 \
  --pnc-device cpu

Useful options include:

  • --device {auto,cpu,cuda}: ASR inference device
  • --pnc-device {auto,cpu,cuda}: punctuation model device
  • --disable-beam-search: use greedy decoding only
  • --disable-vad: disable VAD-based chunking
  • --hotwords-file /path/to/file.txt: override the hotwords list
  • --model-path /path/to/final: override the ASR checkpoint
  • --pnc-model-id /path/to/final: override the punctuation checkpoint
  • --output-dir /path/to/out: write outputs to a custom directory

Running on Another Machine

This bundle is portable because the default model paths are resolved relative to the bundle root. If you copy the entire directory, the default commands continue to work without editing any paths.

Recommended handoff workflow:

  1. Copy the whole asr_vi_wav2vec2_telephony_portable/ directory to the target machine.
  2. Do not rely on an existing .venv copied from the source machine.
  3. Install system-level prerequisites on the target machine: Python, ffmpeg, and GPU drivers if needed.
  4. Create a new virtual environment on the target machine.
  5. Install dependencies from requirements.infer.txt.
  6. Run inference with bash run_infer.sh /path/to/audio.wav.

Example:

scp -r asr_vi_wav2vec2_telephony_portable user@target:/opt/

ssh user@target
cd /opt/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt
bash run_infer.sh /data/sample_call.wav

Important portability notes:

  • Copy the entire bundle, not just model.safetensors.
  • Hugging Face checkpoints require the full exported directory, including files such as config.json, tokenizer files, and preprocessing metadata.
  • Rebuild the Python environment on the target machine instead of copying one from another host.
  • If the target machine has no GPU, run on CPU by default or pass --device cpu --pnc-device cpu.
  • If the target machine has restricted internet access, prepare the Python wheels in advance; the model files themselves are already included locally in this bundle.

Minimum Files Required

If someone asks whether a single checkpoint file is enough, the answer is no.

At minimum, you need:

  • the full ASR exported model directory under final/
  • the full PnC exported model directory under final/
  • the inference scripts
  • the src/ runtime package
  • the hotwords file if beam-search decoding should preserve domain terms

If the destination machine already has the exact same inference code and directory conventions, you may copy only the ASR and PnC model directories. In practice, shipping the full bundle is safer and reduces setup errors.

Troubleshooting

ffmpeg: command not found

  • Install ffmpeg and ensure it is available in the shell PATH.

CUDA requested but no GPU is available

  • Run with --device cpu --pnc-device cpu, or install the correct GPU stack.

Import or wheel compatibility issues after copying from another machine

  • Delete the copied virtual environment and create a new one on the target machine.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support