You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ASR + Invoice Extraction Server

Standalone packaging of Server_conformer.py to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience.

What’s inside

  • Server_conformer.py, Speech2text.py, InformationExtractor.py
  • chunkformer/ code
  • chunkformer-model/
  • requirements.txt

Prerequisites

  • Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow)
  • Hugging Face token with access to the models you use (HF_TOKEN)
  • Chunkformer RNNT checkpoint available at chunkformer-model (copied into this folder). Update CHUNKFORMER_MODEL_PATH if you place it elsewhere.

Setup

cd Speech2Invoice
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure environment

Create a .env (or export env vars) with at least:

PORT=8000
USE_NGROK=false
HF_TOKEN=your_hf_token_here
CHUNKFORMER_MODEL_PATH=chunkformer-model
LOG_LEVEL=DEBUG
DEBUG=true

# Optional ngrok
NGROK_AUTHTOKEN=
NGROK_REGION=ap

# Optional invoice LLM overrides (defaults are fast)
IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat
IE_MAX_NEW_TOKENS=256
IE_DO_SAMPLE=false
IE_TEMPERATURE=0.0
IE_TOP_P=0.8

If you move the model elsewhere, set CHUNKFORMER_MODEL_PATH to that directory.

Run

python3 Server_conformer.py

Endpoints

  • POST /transcribe — multipart/form-data with audio file (wav, mp3, m4a, ogg, webm). Returns JSON with final_result and full_transcription.
  • POST /ticket — JSON body {"full_transcription": "<text>"}. Returns invoice JSON inferred by Qwen.

Notes

  • The invoice extractor requires GPU and HF download on first run. Use smaller models via IE_LLM_MODEL_ID for speed.
  • Model weights for the RNNT checkpoint are included in chunkformer-model/. For large files, consider git-lfs if you plan to push to a remote.

Contact

For questions or controlled access requests to Speech2Invoice:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support