F5-TTS 4-bit Distill

This package contains the current Agent Kernel Lite Peyton voice F5-TTS export. It keeps the original F5-TTS CFM DiT architecture and ships the large tensors as rowwise signed int4 with fp16 scales.

This is a weights-only custom runtime bundle. It is not a standalone transformers pipeline. Loading requires an F5-TTS-compatible runtime that understands f5tts, plus the app vocoder bundle.

Upgrades Over Original F5-TTS

Original architecture preserved: the checkpoint is still F5-TTS CFM DiT; the work here is quantization/distillation/runtime packaging, not a custom replacement architecture.
Smaller deployment payload: original F5-TTS Base is roughly 335.8M parameters and about 1.25 GiB as FP32 model-only weights. This release packs the large tensors into 4-bit weights with a total tensor payload of about 163.65 MiB.
Lower memory bandwidth: large matrix weights are stored as signed int4 with rowwise fp16 scales, while small/excluded tensors remain fp16.
Fast generation target: the current Agent Kernel Lite preset uses 8 function-evaluation steps, compared with more typical original F5-TTS settings around 24-32 steps. That is 3x fewer DiT evaluations versus a 24-step run and 4x fewer DiT evaluations versus a 32-step run.
Full-Q4-surface training: this candidate was selected after training and evaluating with the whole deployed F5-TTS tensor surface quantized, matching the iPhone/custom-WASM runtime more closely than the previous partial-Q4 candidates.

F5-TTS References

model hf: https://huggingface.co/SWivid/F5-TTS
model github: https://github.com/SWivid/F5-TTS
creator hf: https://huggingface.co/SWivid
creator github: https://github.com/SWivid

Speed

The release target is faster generation through both fewer sampling steps and a smaller packed weight payload:

Original-style F5-TTS generation: commonly 24-32 function-evaluation steps.
This release-candidate preset: 8 function-evaluation steps, CFG 2.0, speed 1.15.
Step-count speedup: 3x fewer DiT forward passes versus 24-step generation, or 4x fewer DiT forward passes versus 32-step generation.
Weight payload reduction: about 1.25 GiB FP32 model-only weights down to 163.65 MiB packed Q4/fp16 tensors, roughly 7.8x smaller.

Earlier measured synthesis-only wall-clock on an NVIDIA RTX 3090 with the PyTorch F5-TTS/Vocos path, same text and reference audio, CFG-free:

Setting	Synth time	Relative to 8-step
8 steps	`1.57 s`	`1.00x`
24 steps	`3.84 s`	`2.44x slower`
32 steps	`5.11 s`	`3.25x slower`

Actual wall-clock speed depends on runtime, hardware, audio length, vocoder, and whether Q4 tensors are executed directly or dequantized through a fallback path. The Agent Kernel Lite custom WASM smoke for this bundle produced finite audio and loaded the packed Q4 tensors through the model-stack runtime.

Model

Architecture: F5-TTS CFM DiT
Dimensions: dim=1024, depth=22, heads=16, ff_mult=2
Text dimension: 512
Conv layers: 4
Mel dimension: 100
Sample rate: 24000
Quantization: rowwise symmetric signed int4 for large tensors, fp16 for small/dense-excluded tensors
Q4 parameters: 335,472,640
Dense fp16 parameters: 1,624,196
Total parameters: 337,096,836
Tensor payload: 171,601,360 bytes, about 163.65 MiB

Runtime / GitHub References

Agent Kernel Lite app/runtime: https://github.com/peytontolbert/agent_kernel_lite
Central model-stack implementation: https://github.com/peytontolbert/model-stack
Browser/mobile Q4 runtime entrypoints in Agent Kernel Lite:
- https://github.com/peytontolbert/model-stack/browser/bitnet/f5tts_q4_dit_runtime.js
- https://github.com/peytontolbert/model-stack/browser/bitnet/q4_wasm_runtime.js
- https://github.com/peytontolbert/model-stack/browser/bitnet/vocos_fp16_runtime.js
- https://github.com/peytontolbert/model-stack/browser/bitnet_wasm/src/lib.rs
Training/distillation script: https://github.com/peytontolbert/model-stack/scripts/distill_f5tts_12_to_4_q4.py

Files

manifest.json: bundle metadata and architecture description
export_summary.json: tensor counts and byte sizes
tensors.q4.bin: packed int4 tensor payload
tensor_q4_index.json: index for packed int4 tensors
tensors.fp16.bin: fp16 tensor payload
tensor_fp16_index.json: index for fp16 tensors
F5TTS_Base_vocab.txt: F5-TTS vocabulary
peyton_voice_q4.tar: app-ready voice asset archive with F5 Q4, Vocos Q4, Peyton reference audio, and vocabulary
samples/BF_fullq4_surface_v2_best_nfe8_cfg2_speed115.wav: quality-gate sample
checksums.sha256: SHA-256 checksums for release files

Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.

Downloads last month: -; Downloads are not tracked for this model. How to track