limp-mode-leap1: roadside triage fine-tune of Qwen3.5-4B

The brain of Limp Mode, an offline roadside copilot. Fine-tuned to read a driver's messy description of a car problem and answer a strict-JSON triage verdict: STOP / CAUTION / DRIVE, plain-language reasoning, over-inclusive hazard flags (they feed a deterministic safety floor downstream), no-tools roadside checks, a self-rescue plan adapted to how far help is, and an anti-upsell script for the mechanic. English and Spanish.

Training

  • Data: [N] examples, synthetic conversations from a frontier teacher grounded in verified knowledge bases (3,369 OBD codes, 64 ISO dashboard symbols, 38 hidden-gotcha entries, 15 roadside procedures), passed through deterministic quality gates: JSON schema, severity-floor consistency, enum vocabulary, knowledge grounding, 4-gram dedup, and n-gram decontamination against the eval suite. Includes adversarial slices: noisy retrievals whose correct answer ignores the provided context, and benign cases that punish overcaution.
  • Method: LoRA (r=32, alpha=64, completion-only loss) via Unsloth on Modal (L40S), thinking disabled, 3 epochs.
  • Formats: LoRA adapter, merged fp16, and GGUF Q4_K_M for llama.cpp.

Evaluation: 202-case golden suite

Safety-asymmetric metrics; "dangerous-as-safe" (expected STOP, answered DRIVE) must be 0. Both rows are measured through the identical pipeline, so the difference is the fine-tune.

stage verdict accuracy dangerous-as-safe schema valid knowledge surfaced
base Qwen3.5-4B, full pipeline 83.2% 0 99.5% 98.9%
this model, full pipeline 92.6% 0 100% 97.9%

Per category, the fine-tuned model scores 100% on OBD-code and dashboard-symbol cases, 94.6% on hidden-cause cases, and 91.5% on free-form judgment. The honest soft spots are benign cases (81%, a little residual overcaution) and Spanish (84%).

Eval harness, suite, and full traces are public: https://huggingface.co/datasets/build-small-hackathon/limp-mode-traces

Usage

Deployed inside Limp Mode's pipeline: deterministic intake (symbols/OBD) → IDF retrieval over the gotchas KB → this model (strict JSON contract) → deterministic severity floor that can raise but never lower the verdict. Use the system prompt from the Space repo's app/pipeline.py for faithful behavior.

llama-server -m limpmode-leap1-Q4_K_M.gguf --port 8080 -ngl 99

Limitations

A 4B model for safety-adjacent advice: it is deliberately caged. The surrounding app never lets it downgrade hard-evidence emergencies, never lets it paraphrase verified procedures, and shows the user every safety override. Use it with the cage.

Downloads last month
8
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/limp-mode-leap1

Finetuned
Qwen/Qwen3.5-4B
Quantized
(241)
this model