@dystrio on Hugging Face: "Dystrio Sculpt — Dense, smaller drop-in replacements for Mistral 7B, Llama 3.1…"

Post

225

Dystrio Sculpt — Dense, smaller drop-in replacements for Mistral 7B, Llama 3.1 8B, and Qwen 2.5 7B**

We built a structural compiler that produces smaller, dense models from existing checkpoints. No sparsity, no custom kernels, no new ops — output models load with standard transformers, work with vLLM, TGI, llama.cpp, and stack with AWQ/GPTQ/GGUF.

Results (default tier, bf16, A100 80GB):

[Mistral 7B Instruct v0.3 → sculpt-default]( dystrio/Mistral-7B-Instruct-v0.3-sculpt-default) — 11% smaller, PPL ratio 0.923 (quality improved), +10% prefill, -8% TTFT

[Llama 3.1 8B Instruct → sculpt-default]( dystrio/Llama-3.1-8B-Instruct-sculpt-default) — 10% smaller, PPL ratio 1.064 (≈same), +8% prefill, -8% TTFT

[Qwen 2.5 7B Instruct → sculpt-default]( dystrio/Qwen2.5-7B-Instruct-sculpt-default) — 9% smaller, PPL ratio 0.990 (quality improved), +7% prefill, -6% TTFT

PPL ratio = WikiText-103 perplexity relative to the original. Below 1.0 means quality improved.

More aggressive tiers available. Each model has 3-4 tiers trading quality for size — up to 30% smaller. Check the model cards for the full benchmark tables and tier comparisons.

All models: [huggingface.co/dystrio](@dystrio )

Join the conversation