Instructions to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF",
	filename="Model-LM-IQ2_S.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Use Docker

docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Ollama
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Ollama:
```
ollama run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
```

Unsloth Studio new

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting

Pi new

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Docker Model Runner:
```
docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
```

Lemonade

How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF-Q4_K_M

List all available models

lemonade list

MagicQuant Hybrids (v2.0) - Qwen3-4B-Instruct-2507-unsloth

MagicQuant is not a quantization technique by itself.

It is a search, judging, and hybrid-discovery system that learns from baseline families such as llama.cpp and external/custom baseline sources, then uses isolated samples, rank-safe prediction, and real benchmarking to keep the practical survivors.

Sometimes a hybrid beats a pure baseline. Sometimes it does not. MagicQuant finds non linear good trades to discover potential better hybrids, good sub spaces between anchor baselines and more.

Final surviving downloadable outputs

Name	Provider	KLD	Size (GB)	Download
LM-Q8_0	llama.cpp	0.001339	3.99	Link
MQ-Q6_K_1	MagicQuant	0.001817	3.58	Link
UD-Q6_K_XL	Unsloth	0.002111	3.41	Link
LM-Q6_K	llama.cpp	0.004640	3.08	Link
MQ-Q5_K_1	MagicQuant	0.006632	2.88	Link
UD-Q5_K_XL	Unsloth	0.009839	2.73	Link
MQ-Q4_K_M_1	MagicQuant	0.020346	2.44	Link
LM-Q4_K_S	llama.cpp	0.029803	2.22	Link
LM-IQ4_XS	llama.cpp	0.031300	2.11	Link
UD-Q3_K_XL	Unsloth	0.072278	1.98	Link
LM-IQ3_S	llama.cpp	0.091992	1.77	Link
LM-IQ3_XXS	llama.cpp	0.190404	1.56	Link
LM-IQ2_S	llama.cpp	0.431128	1.32	Link
LM-IQ2_XXS	llama.cpp	0.938021	1.16	Link

Release metadata

Final survivor metrics — full file names, KLD, PPL delta %, byte sizes, download targets, and replacement lineage. PPL delta % is measured against the native/reference PPL when available; negative is better and larger positive values are worse.
Hybrid tensor map — tensor-group assignments and effective-state details for MagicQuant hybrid GGUFs.
Replacement details — structured details for baselines or anchors removed from the final download table, including reason codes, KLD deltas, PPL delta %, and size deltas.

Replacement reason codes

STRICT_DOMINANCE — the winner was no larger and had lower real KLD than the removed anchor.
NEAR_BASELINE_PREMIUM — the winner used only the configured near-baseline size premium and beat the real linear KLD trade line.
INTERIOR_DISCOVERY — the winner was selected as a useful interior point inside a size/KLD gap between anchors.
SPACING_COLLAPSE — two candidates were too close in practical output space, so the stronger one was kept.
FINAL_DOMINANCE — a later validated survivor dominated this artifact in final real benchmark comparison.

Underlined names in the table replaced or ultimately inherited the replacement of another artifact. Hover the name for the short replacement summary, or inspect magicquant.replacements.json for exact KLD/PPL/size deltas.

Provider credits

llama.cpp — Baseline quantization formats and llama.cpp tooling.
Unsloth — External learned baseline source (UD).

Warning

External/custom baselines are normalized into MagicQuant's controlled comparison flow. MagicQuant may rebuild a learned baseline under native-source / MagicQuant-controlled conditions, including its own imatrix handling, so hybrids can be judged on a more equal footing. That does not mean MagicQuant proved the original upstream artifact or upstream imatrix was worse. These comparisons exist for internal hybrid-search consistency, not as a universal judgment of the original creator's exact release artifact.

Support

I’m a solo developer working full time for myself to achieve my dream. I build open source code on the side. If you like any of my work, buying me a coffee is always appreciated. Otherwise, I hope you enjoy, maybe give me a star or something. Or just send me good vibes. Either way, thank you!

Click here to see ways to support - BTC, Paypal, GitHub sponsors.