F5-TTS 4-bit Distill

This package contains the current Agent Kernel Lite Peyton voice F5-TTS export. It keeps the original F5-TTS CFM DiT architecture and ships the large tensors as rowwise signed int4 with fp16 scales.

This is a weights-only custom runtime bundle. It is not a standalone transformers pipeline. Loading requires an F5-TTS-compatible runtime that understands f5tts, plus the app vocoder bundle.

Upgrades Over Original F5-TTS

  • Original architecture preserved: the checkpoint is still F5-TTS CFM DiT; the work here is quantization/distillation/runtime packaging, not a custom replacement architecture.
  • Smaller deployment payload: original F5-TTS Base is roughly 335.8M parameters and about 1.25 GiB as FP32 model-only weights. This release packs the large tensors into 4-bit weights with a total tensor payload of about 163.65 MiB.
  • Lower memory bandwidth: large matrix weights are stored as signed int4 with rowwise fp16 scales, while small/excluded tensors remain fp16.
  • Fast generation target: the current Agent Kernel Lite preset uses 8 function-evaluation steps, compared with more typical original F5-TTS settings around 24-32 steps. That is 3x fewer DiT evaluations versus a 24-step run and 4x fewer DiT evaluations versus a 32-step run.
  • Full-Q4-surface training: this candidate was selected after training and evaluating with the whole deployed F5-TTS tensor surface quantized, matching the iPhone/custom-WASM runtime more closely than the previous partial-Q4 candidates.

F5-TTS References

Speed

The release target is faster generation through both fewer sampling steps and a smaller packed weight payload:

  • Original-style F5-TTS generation: commonly 24-32 function-evaluation steps.
  • This release-candidate preset: 8 function-evaluation steps, CFG 2.0, speed 1.15.
  • Step-count speedup: 3x fewer DiT forward passes versus 24-step generation, or 4x fewer DiT forward passes versus 32-step generation.
  • Weight payload reduction: about 1.25 GiB FP32 model-only weights down to 163.65 MiB packed Q4/fp16 tensors, roughly 7.8x smaller.

Earlier measured synthesis-only wall-clock on an NVIDIA RTX 3090 with the PyTorch F5-TTS/Vocos path, same text and reference audio, CFG-free:

Setting Synth time Relative to 8-step
8 steps 1.57 s 1.00x
24 steps 3.84 s 2.44x slower
32 steps 5.11 s 3.25x slower

Actual wall-clock speed depends on runtime, hardware, audio length, vocoder, and whether Q4 tensors are executed directly or dequantized through a fallback path. The Agent Kernel Lite custom WASM smoke for this bundle produced finite audio and loaded the packed Q4 tensors through the model-stack runtime.

Model

  • Architecture: F5-TTS CFM DiT
  • Dimensions: dim=1024, depth=22, heads=16, ff_mult=2
  • Text dimension: 512
  • Conv layers: 4
  • Mel dimension: 100
  • Sample rate: 24000
  • Quantization: rowwise symmetric signed int4 for large tensors, fp16 for small/dense-excluded tensors
  • Q4 parameters: 335,472,640
  • Dense fp16 parameters: 1,624,196
  • Total parameters: 337,096,836
  • Tensor payload: 171,601,360 bytes, about 163.65 MiB

Runtime / GitHub References

  • Agent Kernel Lite app/runtime: https://github.com/peytontolbert/agent_kernel_lite
  • Central model-stack implementation: https://github.com/peytontolbert/model-stack
  • Browser/mobile Q4 runtime entrypoints in Agent Kernel Lite:
    • https://github.com/peytontolbert/model-stack/browser/bitnet/f5tts_q4_dit_runtime.js
    • https://github.com/peytontolbert/model-stack/browser/bitnet/q4_wasm_runtime.js
    • https://github.com/peytontolbert/model-stack/browser/bitnet/vocos_fp16_runtime.js
    • https://github.com/peytontolbert/model-stack/browser/bitnet_wasm/src/lib.rs
  • Training/distillation script: https://github.com/peytontolbert/model-stack/scripts/distill_f5tts_12_to_4_q4.py

Files

  • manifest.json: bundle metadata and architecture description
  • export_summary.json: tensor counts and byte sizes
  • tensors.q4.bin: packed int4 tensor payload
  • tensor_q4_index.json: index for packed int4 tensors
  • tensors.fp16.bin: fp16 tensor payload
  • tensor_fp16_index.json: index for fp16 tensors
  • F5TTS_Base_vocab.txt: F5-TTS vocabulary
  • peyton_voice_q4.tar: app-ready voice asset archive with F5 Q4, Vocos Q4, Peyton reference audio, and vocabulary
  • samples/BF_fullq4_surface_v2_best_nfe8_cfg2_speed115.wav: quality-gate sample
  • checksums.sha256: SHA-256 checksums for release files

Use only with authorization from the voice owner and in contexts where synthetic voice output is appropriate.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support