📟 Gemma4-31B-Dense-Imatrix-IQ3_M.gguf (2026 Edition)

"Local intelligence... to the max."

This is a custom-quantized version of Gemma4-31B, specifically optimized to obtain the highest possible local byte-intelligence ratio with 24GB+ RAM consumer laptops or computers.

🧠 Why this model is different

Unlike a standard quant, this model was processed using a custom Importance Matrix (imatrix). The training data for the imatrix was hand-curated to preserve:

  • Incredible reasoning: Inclusion of custom coding examples built with frontier models provides high retention of very specific and sharp architectural reasoning skills
  • Logical Flow: Inclusion of llama.cpp source code, logic puzzles, and historical writing in the imatrix training to ensure the model stays coherent at low bitrates.
  • High Speed: Built using llama.cpp specifically for local-first AI and edge computing setups like apple silicon with minimum 24GB RAM

🛠 Quantization Details

  • Base Model: Gemma4-31B
  • Quantization: IQ3_M
  • Format: GGUF
  • Size: ~14.42 GB
  • Context Length: 262144 tokens

📈 Perplexity Benchmarks

I attempted to run perplexity tests as normal, but this model is strange. The numbers for perplexity were incredibly high (lower is better) so I am choosing not to include the data...

⚖️ Evaluation Verdict

Expect similar if not better performace to that of a standard 4-bit quant.

🚀 Hardware Performance (Apple M2)

coming soon

🌐 Links

Check out my other models!

Qwen3.6-35B-SuperMoE.

Qwen3.6-27B-SuperDense.

Both make excellent companions to this model!


Downloads last month
293
GGUF
Model size
31B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for macwhisperer/Gemma4-31B-SuperDense

Quantized
(210)
this model