Qwen3-VL-2B-Instruct-4bit

Qwen3-VL-2B-Instruct-4bit

A verbatim mirror of mlx-community/Qwen3-VL-2B-Instruct-4bit, kept here so the Vanta iOS app always has a stable lower-RAM model to download from.

Run it on your iPhone with Vanta

This is one of the built-in one-tap downloads in Vanta - Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework - no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.

Vanta recommends this smaller model on RAM-tight devices where the 4B Thinking model is likely too heavy.

Download Vanta on the App Store ->

This is a copy. Every model file in this repository is an exact copy of mlx-community/Qwen3-VL-2B-Instruct-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community, Qwen, and the original authors.

Model Details

Original Model: Qwen/Qwen3-VL-2B-Instruct
Upstream MLX Repo: mlx-community/Qwen3-VL-2B-Instruct-4bit
Quantization: 4-bit
Format: MLX SafeTensors
Framework: mlx-vlm
Model Type: qwen3_vl
Task: Image-text-to-text
Disk Size: ~1.78 GB

Conversion Details

The upstream model was converted to MLX format from Qwen/Qwen3-VL-2B-Instruct using mlx-vlm version 0.3.4.

Related Models

Default Vanta pick: TerminatorPower/Qwen3-VL-4B-Thinking-4bit
Upstream MLX repo: mlx-community/Qwen3-VL-2B-Instruct-4bit
Original: Qwen/Qwen3-VL-2B-Instruct

Usage

from mlx_vlm import load, generate

model, processor = load("TerminatorPower/Qwen3-VL-2B-Instruct-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model TerminatorPower/Qwen3-VL-2B-Instruct-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.

Downloads last month: 27

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for TerminatorPower/Qwen3-VL-2B-Instruct-4bit

Base model

Qwen/Qwen3-VL-2B-Instruct

Quantized

(70)

this model