You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

FroST-4B β€” Frozen-State Conversational Speech Model

A conversational speech model: a frozen Qwen3-4B writes a text reply and its hidden states drive an autoregressive speech head that emits XY_Tokenizer codes, decoded to 24 kHz audio. One input β†’ text reply + spoken audio.

  • Backbone: Qwen3-4B (frozen; a LoRA r=32 adapter is baked in)
  • Speech head: ~341M params, delay-pattern, 8 codebooks (XY_Tokenizer)
  • Conditioning: hidden states tapped at layer 24 β†’ speech head (cross-attention)
  • Trainable at train time: LoRA + projector + speech head only (base + lm_head frozen)
  • Codec dependency: OpenMOSS-Team/XY_Tokenizer_TTSD_V0_hf (downloaded at load)

Usage

from frost import FroST
frost = FroST.from_pretrained("apxrv/frost-4b")
out = frost.chat("hi, how are you?", system="You are warm and upbeat.")
print(out.text); out.save("reply.wav")

Trained 1500 steps. See the project repo for training/eval code.

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support