Instructions to use PeytonT/f5tts-4bit-distill with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use PeytonT/f5tts-4bit-distill with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
F5-TTS 4-bit Distill
This package contains the current Agent Kernel Lite Peyton voice F5-TTS export. It keeps the original F5-TTS CFM DiT architecture and ships the large tensors as rowwise signed int4 with fp16 scales.
This is a weights-only custom runtime bundle. It is not a standalone
transformers pipeline. Loading requires an F5-TTS-compatible runtime that
understands f5tts, plus the app vocoder bundle.
Upgrades Over Original F5-TTS
- Original architecture preserved: the checkpoint is still F5-TTS CFM DiT; the work here is quantization/distillation/runtime packaging, not a custom replacement architecture.
- Smaller deployment payload: original F5-TTS Base is roughly 335.8M parameters and about 1.25 GiB as FP32 model-only weights. This release packs the large tensors into 4-bit weights with a total tensor payload of about 163.65 MiB.
- Lower memory bandwidth: large matrix weights are stored as signed int4 with rowwise fp16 scales, while small/excluded tensors remain fp16.
- Fast generation target: the current Agent Kernel Lite preset uses 8 function-evaluation steps, compared with more typical original F5-TTS settings around 24-32 steps. That is 3x fewer DiT evaluations versus a 24-step run and 4x fewer DiT evaluations versus a 32-step run.
- Full-Q4-surface training: this candidate was selected after training and evaluating with the whole deployed F5-TTS tensor surface quantized, matching the iPhone/custom-WASM runtime more closely than the previous partial-Q4 candidates.
F5-TTS References
- model hf: https://huggingface.co/SWivid/F5-TTS
- model github: https://github.com/SWivid/F5-TTS
- creator hf: https://huggingface.co/SWivid
- creator github: https://github.com/SWivid
Speed
The release target is faster generation through both fewer sampling steps and a smaller packed weight payload:
- Original-style F5-TTS generation: commonly
24-32function-evaluation steps. - This release-candidate preset:
8function-evaluation steps, CFG2.0, speed1.15. - Step-count speedup:
3xfewer DiT forward passes versus 24-step generation, or4xfewer DiT forward passes versus 32-step generation. - Weight payload reduction: about
1.25 GiBFP32 model-only weights down to163.65 MiBpacked Q4/fp16 tensors, roughly7.8xsmaller.
Earlier measured synthesis-only wall-clock on an NVIDIA RTX 3090 with the PyTorch F5-TTS/Vocos path, same text and reference audio, CFG-free:
| Setting | Synth time | Relative to 8-step |
|---|---|---|
| 8 steps | 1.57 s |
1.00x |
| 24 steps | 3.84 s |
2.44x slower |
| 32 steps | 5.11 s |
3.25x slower |
Actual wall-clock speed depends on runtime, hardware, audio length, vocoder, and whether Q4 tensors are executed directly or dequantized through a fallback path. The Agent Kernel Lite custom WASM smoke for this bundle produced finite audio and loaded the packed Q4 tensors through the model-stack runtime.
Model
- Architecture: F5-TTS CFM DiT
- Dimensions:
dim=1024,depth=22,heads=16,ff_mult=2 - Text dimension:
512 - Conv layers:
4 - Mel dimension:
100 - Sample rate:
24000 - Quantization: rowwise symmetric signed int4 for large tensors, fp16 for small/dense-excluded tensors
- Q4 parameters:
335,472,640 - Dense fp16 parameters:
1,624,196 - Total parameters:
337,096,836 - Tensor payload:
171,601,360bytes, about163.65 MiB
Runtime / GitHub References
- Agent Kernel Lite app/runtime:
https://github.com/peytontolbert/agent_kernel_lite - Central model-stack implementation:
https://github.com/peytontolbert/model-stack - Browser/mobile Q4 runtime entrypoints in Agent Kernel Lite:
https://github.com/peytontolbert/model-stack/browser/bitnet/f5tts_q4_dit_runtime.jshttps://github.com/peytontolbert/model-stack/browser/bitnet/q4_wasm_runtime.jshttps://github.com/peytontolbert/model-stack/browser/bitnet/vocos_fp16_runtime.jshttps://github.com/peytontolbert/model-stack/browser/bitnet_wasm/src/lib.rs
- Training/distillation script:
https://github.com/peytontolbert/model-stack/scripts/distill_f5tts_12_to_4_q4.py
Files
manifest.json: bundle metadata and architecture descriptionexport_summary.json: tensor counts and byte sizestensors.q4.bin: packed int4 tensor payloadtensor_q4_index.json: index for packed int4 tensorstensors.fp16.bin: fp16 tensor payloadtensor_fp16_index.json: index for fp16 tensorsF5TTS_Base_vocab.txt: F5-TTS vocabularypeyton_voice_q4.tar: app-ready voice asset archive with F5 Q4, Vocos Q4, Peyton reference audio, and vocabularysamples/BF_fullq4_surface_v2_best_nfe8_cfg2_speed115.wav: quality-gate samplechecksums.sha256: SHA-256 checksums for release files
Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.