Vanta - Local AI LLM Chat

Qwen3-VL-2B-Instruct-4bit

A verbatim mirror of mlx-community/Qwen3-VL-2B-Instruct-4bit, kept here so the Vanta iOS app always has a stable lower-RAM model to download from.

Run it on your iPhone with Vanta

This is one of the built-in one-tap downloads in Vanta - Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework - no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.

Vanta recommends this smaller model on RAM-tight devices where the 4B Thinking model is likely too heavy.

Download Vanta on the App Store ->


This is a copy. Every model file in this repository is an exact copy of mlx-community/Qwen3-VL-2B-Instruct-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community, Qwen, and the original authors.


Model Details

Conversion Details

The upstream model was converted to MLX format from Qwen/Qwen3-VL-2B-Instruct using mlx-vlm version 0.3.4.

Related Models

Usage

from mlx_vlm import load, generate

model, processor = load("TerminatorPower/Qwen3-VL-2B-Instruct-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model TerminatorPower/Qwen3-VL-2B-Instruct-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.

Downloads last month
27
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TerminatorPower/Qwen3-VL-2B-Instruct-4bit

Quantized
(70)
this model