Image-Text-to-Text
Transformers
Safetensors
qwen3_5
compressed-tensors
qwen3_6
int4
int8
mixed
autoround
conversational
4-bit precision
auto-round
Instructions to use Minachist/Qwen3.6-27B-Mixed-AutoRound with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Minachist/Qwen3.6-27B-Mixed-AutoRound") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound") model = AutoModelForImageTextToText.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Minachist/Qwen3.6-27B-Mixed-AutoRound" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minachist/Qwen3.6-27B-Mixed-AutoRound", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Minachist/Qwen3.6-27B-Mixed-AutoRound
- SGLang
How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Minachist/Qwen3.6-27B-Mixed-AutoRound" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minachist/Qwen3.6-27B-Mixed-AutoRound", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Minachist/Qwen3.6-27B-Mixed-AutoRound" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Minachist/Qwen3.6-27B-Mixed-AutoRound", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with Docker Model Runner:
docker model run hf.co/Minachist/Qwen3.6-27B-Mixed-AutoRound
Qwen3.6-27B Mixed AutoRound
This is an unofficial quantized version of the Qwen3.6-27B. It was created using AutoRound with a custom mixed-precision recipe.
Quantization details
- This model uses a mixed-precision quantization to balance performance and model size.
- The
self_attnlayers are quantized to 8-bit. - The MLP layers are generally quantized to 4-bit, but the first 3 and last 3 layers are kept at 8-bit.
- The
lm_head,linear_attn,visual,mtp.fclayers are kept unquantized in FP16.
| Field | Custom Mixed Recipe |
|---|---|
| Base | Qwen/Qwen3.6-27B |
| Method | AutoRound (intel/auto-round), custom recipe |
| Scheme | Mixed (W4A16 / W8A16) |
| Bits | 4 & 8 |
| Group size | 128 |
| Symmetric | yes |
| Unquantized layers | lm_head, linear_attn, visual, mtp.fc |
| Calibration dataset | NeelNanda/pile-10k |
| Calibration samples | 512 |
| Sequence length | 2048 |
| Iterations | 1000 |
| Batch size | 8 |
| torch.compile | enabled |
- For more information, please check
quantize.py.
KLD Metrics
| Metric | Value | Description |
|---|---|---|
| Median KLD | 0.005592 | Median divergence |
| P90 KLD | 0.034514 | Divergence at the 90th percentile |
| Mean KLD | 0.046941 | Average divergence |
| Mean Coverage | 0.994750 | - |
Evaluation Configuration
| Parameter | Value |
|---|---|
| Calibration Dataset | wikitext-2-raw-v1 (test) |
| Sequence Length | 2048 |
| Num Samples | 64 |
| Total Positions | 131,008 |
| Top-K Reference | 1000 |
How to use
This model is tested on the latest
docker.io/vllm/vllm-openai:cu130-nightly.vLLM is recommended.
鈿狅笍 Important Note: Do NOT use
FLASHINFERas the attention backend (--attention-backend FLASHINFER), as it may cause compatibility issues for some people!Example args (For 2x 3090 Users) :
vllm serve ./Qwen3.6-27B-mixed-autoround \
--tensor-parallel-size 2 \
--attention-backend FLASH_ATTN \
--performance-mode interactivity \
--max-model-len auto \
--max-num-batched-tokens 2048 \
--max-num-seqs 1 \
--gpu-memory-utilization 0.96 \
--compilation-config '{"mode":"VLLM_COMPILE","cudagraph_capture_sizes":[4]}' \
-O3 \
--async-scheduling \
--language-model-only \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
--default-chat-template-kwargs.preserve_thinking true \
--mamba-cache-mode all \
--mamba-block-size 8 \
--enable-prefix-caching \
--enable-chunked-prefill
- With these settings, you get full context.
- Note: This information is based on current understanding and testing. Optimal configurations may vary depending on your specific hardware setup. For further details, please refer to the official vLLM documentation.
Acknowledgements
- Lorbus for the README.md format
- Alibaba / Qwen team for the base Qwen3.6-27B model
- Intel AutoRound team for the quantization framework
- vLLM project for the inference engine and Qwen3_5 MTP support
- Downloads last month
- 890
Model tree for Minachist/Qwen3.6-27B-Mixed-AutoRound
Base model
Qwen/Qwen3.6-27B