YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Vietnamese Telephony ASR Portable Bundle
This directory is a self-contained inference bundle for long-form Vietnamese telephony speech recognition. It includes the runtime code, the exported ASR checkpoint, the punctuation/casing restoration checkpoint, and the Ocany domain hotwords file required for decoding.
The bundle is intended for deployment or handoff to another machine with minimal setup.
What Is Included
- Vietnamese telephony ASR model checkpoint:
outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final - Tourmii punctuation and casing restoration checkpoint:
tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final - Minimal inference scripts:
run_infer.sh,scripts/run_longform_infer.py,scripts/infer_ctc.py - Supporting runtime package:
src/asr_vi_wav2vec2_telephony/ - Domain hotwords:
configs/ocany_hotwords.txt
Directory Layout
asr_vi_wav2vec2_telephony_portable/
βββ configs/
β βββ ocany_hotwords.txt
βββ outputs/
β βββ results64k_wav2vec2_cleaned_v2_telaug30_longer/
β βββ final/
βββ requirements.infer.txt
βββ run_infer.sh
βββ scripts/
β βββ infer_ctc.py
β βββ run_longform_infer.py
βββ src/
β βββ asr_vi_wav2vec2_telephony/
βββ tourmii_pnc_finetune/
βββ outputs/
βββ tourmii_pnc_domain_v1/
βββ final/
System Requirements
- Python 3.10 or newer
ffmpegavailable in the systemPATH- Linux or another environment that can run
bash - Optional GPU support:
install a
torchandtorchaudiobuild compatible with the target machine's CUDA runtime
Setup
Create a fresh virtual environment inside the bundle and install the inference dependencies:
cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt
If you plan to run on GPU, make sure the installed torch and torchaudio packages match the target machine's CUDA version. Do not assume a virtual environment created on one machine will work correctly on another.
Quick Start
The shortest command is:
cd /path/to/asr_vi_wav2vec2_telephony_portable
bash run_infer.sh /path/to/audio.wav
You can also explicitly choose the Python interpreter:
cd /path/to/asr_vi_wav2vec2_telephony_portable
PYTHON_BIN=/usr/bin/python3 bash run_infer.sh /path/to/audio.wav
Default Runtime Behavior
By default, the bundle will:
- load the ASR model from
outputs/results64k_wav2vec2_cleaned_v2_telaug30_longer/final - enable beam-search decoding
- load hotwords from
configs/ocany_hotwords.txt - use
--hotword-weight 4 - enable punctuation and casing restoration
- load the local Tourmii PnC model from
tourmii_pnc_finetune/outputs/tourmii_pnc_domain_v1/final - write outputs to
tmp/<audio_stem>_guard_test
For an input file named call_2332.wav, the output directory will be:
tmp/call_2332_guard_test
The output bundle typically contains:
result.json: structured inference output and metadataresult.txt: best final text outputresult.png: waveform visualization with transcript context<audio_stem>.prepared.wav: converted and normalized working audio
Advanced Usage
Use the Python entry point if you need to override defaults:
cd /path/to/asr_vi_wav2vec2_telephony_portable
python3 scripts/run_longform_infer.py \
--audio /data/call_2332.wav \
--output-dir ./tmp/custom_out \
--hotword-weight 2 \
--pnc-device cpu
Useful options include:
--device {auto,cpu,cuda}: ASR inference device--pnc-device {auto,cpu,cuda}: punctuation model device--disable-beam-search: use greedy decoding only--disable-vad: disable VAD-based chunking--hotwords-file /path/to/file.txt: override the hotwords list--model-path /path/to/final: override the ASR checkpoint--pnc-model-id /path/to/final: override the punctuation checkpoint--output-dir /path/to/out: write outputs to a custom directory
Running on Another Machine
This bundle is portable because the default model paths are resolved relative to the bundle root. If you copy the entire directory, the default commands continue to work without editing any paths.
Recommended handoff workflow:
- Copy the whole
asr_vi_wav2vec2_telephony_portable/directory to the target machine. - Do not rely on an existing
.venvcopied from the source machine. - Install system-level prerequisites on the target machine:
Python,
ffmpeg, and GPU drivers if needed. - Create a new virtual environment on the target machine.
- Install dependencies from
requirements.infer.txt. - Run inference with
bash run_infer.sh /path/to/audio.wav.
Example:
scp -r asr_vi_wav2vec2_telephony_portable user@target:/opt/
ssh user@target
cd /opt/asr_vi_wav2vec2_telephony_portable
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.infer.txt
bash run_infer.sh /data/sample_call.wav
Important portability notes:
- Copy the entire bundle, not just
model.safetensors. - Hugging Face checkpoints require the full exported directory, including files such as
config.json, tokenizer files, and preprocessing metadata. - Rebuild the Python environment on the target machine instead of copying one from another host.
- If the target machine has no GPU, run on CPU by default or pass
--device cpu --pnc-device cpu. - If the target machine has restricted internet access, prepare the Python wheels in advance; the model files themselves are already included locally in this bundle.
Minimum Files Required
If someone asks whether a single checkpoint file is enough, the answer is no.
At minimum, you need:
- the full ASR exported model directory under
final/ - the full PnC exported model directory under
final/ - the inference scripts
- the
src/runtime package - the hotwords file if beam-search decoding should preserve domain terms
If the destination machine already has the exact same inference code and directory conventions, you may copy only the ASR and PnC model directories. In practice, shipping the full bundle is safer and reduces setup errors.
Troubleshooting
ffmpeg: command not found
- Install
ffmpegand ensure it is available in the shellPATH.
CUDA requested but no GPU is available
- Run with
--device cpu --pnc-device cpu, or install the correct GPU stack.
Import or wheel compatibility issues after copying from another machine
- Delete the copied virtual environment and create a new one on the target machine.