LFM2.5-Audio-1.5B — Tool-Aware Fine-Tune (v4)

Full fine-tune of LiquidAI/LFM2.5-Audio-1.5B that handles both turns of a tool-augmented voice flow plus chitchat and refusals.

Class	Trigger	Behavior
`tool_match`	user audio + `Tools available:` block, requested tool listed	Short ack (`"setting your alarm now."`) then stop
`tool_result_speak`	same audio + `Known facts you must use…` block injected via `set_context()`	Speak the result naturally (`"your alarm is set for 7am."`)
`tool_miss`	requested tool not in the listed set	Polite refusal (`"i don't have a maps tool right now, sorry."`)
`non_tool`	conversational query, no tool implied	Base-model-style natural reply (targets self-distilled from base)

Results vs v3

Held-out eval, 120 rows × 30 per class:

Class	v3	v4	Δ
`tool_match`	96.7%	86.7%	−10.0
`tool_result_speak`	100.0%	100.0%	0
`tool_miss`	80.0%	100.0%	+20.0
`non_tool`	60.0%	86.7%	+26.7
Overall	84.0%	93.3%	+9.3

Novel-facts narration (60 OOD tool results never in training): 95% faithful / 0% memorized.

What changed in v4

tool_miss ratio bumped 14% → 28%.
Hard-negative tools_listed: 60% of tool_miss rows include a semantically adjacent tool (e.g. scenario=traffic with maps listed but not traffic).
19 diversified refusal templates (v3 had 5; v3 memorized phrasings).
Explicit "if not listed, decline" clause in the instruction line.
Tighter non_tool filter — drops DailyDialog context-fragments ("Spring .", "About 6:00 .").

Two-turn flow

# turn 1 — model emits "let me check the weather." and stops
# coordinator runs the weather tool, gets "Weather in Tokyo: 72°F, sunny."
await ctrl.<audio_node>.set_context("Weather in Tokyo: 72°F, sunny.")
# turn 2 — re-feed same user audio; model narrates ("it's 72 and sunny in tokyo.")

Training recipe

Base: LiquidAI/LFM2.5-Audio-1.5B, full bf16 finetune
Hardware: 2× RTX 4090
3000 train + 400 eval, mix 22/24/28/26 (tool_match / tool_result_speak / tool_miss / non_tool)
bs=2/GPU × 2 GPUs × 1120 steps (~1.5 epochs)
lr 5e-5, cosine + 100 warmup, ctx=512
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Final val_loss = 0.89

Recipe + scripts: matbee/lfm2-tool-aware-dataset-v4.

Known limitations

4/30 tool_match failures use a refusal template when the tool IS listed — refusal signal slightly over-corrected vs v3. v4.1 will rebalance.
3/60 novel-facts mixed verdicts on iot_lights and weather.

Usage

import torch
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

processor = LFM2AudioProcessor.from_pretrained("matbee/lfm2.5-audio-tool-aware-v4", device="cuda")
model = LFM2AudioModel.from_pretrained(
    "matbee/lfm2.5-audio-tool-aware-v4", device="cuda", dtype=torch.bfloat16
).eval()

Predecessors

matbee/lfm2.5-audio-tool-aware-v1 — initial; mastered turn 1, regressed on turn 2 narration.
matbee/lfm2.5-audio-tool-aware-v2 — added tool_result_speak; 100% ack + 20/20 narration on its eval.
v3 (not published) — added distilled non_tool class; fixed narration but classifier-boundary regressed.
v4 (this release) — fixes tool_miss/non_tool boundary via hard negatives + diversified refusals.

License

Inherited from base: LFM Open License v1.0.

Downloads last month: 25

Safetensors

Model size

1B params

Tensor type

I64

BF16

Model tree for matbee/lfm2.5-audio-tool-aware-v4

Base model

LiquidAI/LFM2-1.2B

Finetuned

LiquidAI/LFM2.5-Audio-1.5B

Finetuned

(4)

this model

Finetunes

1 model

matbee
/

lfm2.5-audio-tool-aware-v4