MOSS-TTS-Nano GGUF (codec only)

GGUF conversion of the MOSS-Audio-Tokenizer-Nano codec used by OpenMOSS-Team/MOSS-TTS-Nano, runnable via codec.cpp.

⚠️ The LLM-part is not converted yet — see Status.

Files

Codec-part (MOSS-Audio-Tokenizer-Nano, `moss_audio` arch, 16 RVQ codebooks × 1024, 48 kHz stereo)

codec[-<quant>].gguf

File	Size
`codec-f32.gguf`	84 MB
`codec-f16.gguf`	42 MB
`codec-q8_0.gguf`	24 MB
`codec-q5_k_m.gguf`	17 MB
`codec-q4_k_m.gguf`	15 MB

Status

The MOSS-TTS-Nano LLM-part is a custom MossTTSNanoForCausalLM architecture that stock llama.cpp doesn't load:

GPT-2 backbone with RoPE (position_embedding_type: "rope", rope_base: 10000) — llama.cpp's gpt2 arch only handles learned absolute positions, not RoPE
16 RVQ codebooks (1024 entries each) emitted per audio frame
Global + local transformer: the GPT-2 global produces a hidden state per timestep and a 1-layer local transformer expands it into the 16 codebook predictions

A working LLM-part GGUF would need a custom moss_tts_nano architecture inside llama.cpp (model loader, graph builder for global + local transformer, multi-codebook output handling). That's substantial work — comparable to or larger than implementing Chatterbox-T3.

For now this repo only ships the codec; the LLM-part will follow if/when the architecture lands in llama.cpp.

Notes

Source weights: OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano
Full upstream Python pipeline: OpenMOSS-Team/MOSS-TTS-Nano

Downloads last month: 285

GGUF

Model size

22.1M params

Architecture

moss_audio_tokenizer

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

32-bit

Model tree for hans00/MOSS-TTS-Nano-GGUF

Base model

OpenMOSS-Team/MOSS-TTS-Nano-100M

Quantized

(1)