Instructions to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF", filename="Model-LM-IQ2_S.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
- Ollama
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Ollama:
ollama run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
- Unsloth Studio new
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF to start chatting
- Pi new
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Docker Model Runner:
docker model run hf.co/magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
- Lemonade
How to use magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF-Q4_K_M
List all available models
lemonade list
MagicQuant Hybrids (v2.0) - Qwen3-4B-Instruct-2507-unsloth
MagicQuant is not a quantization technique by itself.
It is a search, judging, and hybrid-discovery system that learns from baseline families such as llama.cpp and external/custom baseline sources, then uses isolated samples, rank-safe prediction, and real benchmarking to keep the practical survivors.
Sometimes a hybrid beats a pure baseline. Sometimes it does not. MagicQuant finds non linear good trades to discover potential better hybrids, good sub spaces between anchor baselines and more.
Read more on the MagicQuant Wiki Here.
Final surviving downloadable outputs
| Name | Provider | KLD | Size (GB) | Download |
|---|---|---|---|---|
| LM-Q8_0 | llama.cpp | 0.001339 | 3.99 | Link |
| MQ-Q6_K_1 | MagicQuant | 0.001817 | 3.58 | Link |
| UD-Q6_K_XL | Unsloth | 0.002111 | 3.41 | Link |
| LM-Q6_K | llama.cpp | 0.004640 | 3.08 | Link |
| MQ-Q5_K_1 | MagicQuant | 0.006632 | 2.88 | Link |
| UD-Q5_K_XL | Unsloth | 0.009839 | 2.73 | Link |
| MQ-Q4_K_M_1 | MagicQuant | 0.020346 | 2.44 | Link |
| LM-Q4_K_S | llama.cpp | 0.029803 | 2.22 | Link |
| LM-IQ4_XS | llama.cpp | 0.031300 | 2.11 | Link |
| UD-Q3_K_XL | Unsloth | 0.072278 | 1.98 | Link |
| LM-IQ3_S | llama.cpp | 0.091992 | 1.77 | Link |
| LM-IQ3_XXS | llama.cpp | 0.190404 | 1.56 | Link |
| LM-IQ2_S | llama.cpp | 0.431128 | 1.32 | Link |
| LM-IQ2_XXS | llama.cpp | 0.938021 | 1.16 | Link |
Release metadata
- Final survivor metrics — full file names, KLD, PPL delta %, byte sizes, download targets, and replacement lineage. PPL delta % is measured against the native/reference PPL when available; negative is better and larger positive values are worse.
- Hybrid tensor map — tensor-group assignments and effective-state details for MagicQuant hybrid GGUFs.
- Replacement details — structured details for baselines or anchors removed from the final download table, including reason codes, KLD deltas, PPL delta %, and size deltas.
Replacement reason codes
STRICT_DOMINANCE— the winner was no larger and had lower real KLD than the removed anchor.NEAR_BASELINE_PREMIUM— the winner used only the configured near-baseline size premium and beat the real linear KLD trade line.INTERIOR_DISCOVERY— the winner was selected as a useful interior point inside a size/KLD gap between anchors.SPACING_COLLAPSE— two candidates were too close in practical output space, so the stronger one was kept.FINAL_DOMINANCE— a later validated survivor dominated this artifact in final real benchmark comparison.
Underlined names in the table replaced or ultimately inherited the replacement of another artifact. Hover the name for the short replacement summary, or inspect magicquant.replacements.json for exact KLD/PPL/size deltas.
Provider credits
Warning
External/custom baselines are normalized into MagicQuant's controlled comparison flow. MagicQuant may rebuild a learned baseline under native-source / MagicQuant-controlled conditions, including its own imatrix handling, so hybrids can be judged on a more equal footing. That does not mean MagicQuant proved the original upstream artifact or upstream imatrix was worse. These comparisons exist for internal hybrid-search consistency, not as a universal judgment of the original creator's exact release artifact.
Support
I’m a solo developer working full time for myself to achieve my dream. I build open source code on the side. If you like any of my work, buying me a coffee is always appreciated. Otherwise, I hope you enjoy, maybe give me a star or something. Or just send me good vibes. Either way, thank you!
Click here to see ways to support - BTC, Paypal, GitHub sponsors.
- Downloads last month
- 2,799
2-bit
3-bit
4-bit
6-bit
8-bit
Model tree for magiccodingman/Qwen3-4B-Instruct-2507-Unsloth-MagicQuant-v2-GGUF
Base model
Qwen/Qwen3-4B-Instruct-2507