Lance-3B AWQ INT4 (image checkpoint)

4-bit AWQ-quantized variant of bytedance-research/Lance, the Lance_3B image-focused checkpoint (text-to-image, image edit, image understanding).

File-size reduction: 24.7 GB → ~6 GB (4×) Inference VRAM (LLM only, bf16 activations): ~13 GB → ~6 GB

What's different from the video sibling

Lance ships two checkpoints:

Lance_3B — image-focused (this one). 24.7 GB F32 source. No bundled ViT, smaller latent_pos_embed (image grid only).
Lance_3B_Video — video-focused. 28.4 GB F32 source. Bundles the Qwen2.5-VL ViT in its safetensors + larger video-grid latent_pos_embed. Quantized variant: Reza2kn/Lance-3B-Video-AWQ-INT4.

This image checkpoint relies on the standalone Qwen2.5-VL-ViT for vision encoding (also bundled in the official Lance HF repo; not redistributed here).

What was quantized

Same MoT-aware scheme as the video sibling — 504 Linear modules in language_model.* (252 understanding-path + 252 generation-expert _moe_gen variants), 360 with AWQ scale fusion into the preceding RMSNorm, 144 with plain per-group min-max (o_proj, down_proj). The ViT, projection layers, time embedder, latent positional embeds, and lm_head are kept in bf16.

See the video sibling README for the full per-component table — it's identical here.

Calibration

x2t_image (Lance's 6-sample example set, full 30 timesteps) → 252 und-path linears, 85.3 M tokens of activation data
t2i (Lance's 11-sample example set, 2 denoising timesteps) → all 504 linears (both und and gen paths)
Merged: 252 und + 252 gen Linears all with activation data

File layout

Lance_3B-AWQ-INT4/
├── awq_state_dict.safetensors   # ~6 GB: packed INT4 + bf16 pass-through
├── awq_meta.json                # per-weight scheme + group_size + shape
└── README.md

Storage layout per quantized linear is identical to the video sibling — see that repo for the qweight / scales / zeros byte layout.

How to use

Same as the video sibling. The Lance source ships a custom Lance PreTrainedModel (in github.com/bytedance/Lance). Use the runtime swap-in approach: build Lance normally, then replace nn.Linear modules in language_model.* with the WQLinearINT4 reference module and stream the AWQ buffers in.

A complete reproduction (calibration scripts + WQLinearINT4 + run_quant_eval.py) is at: https://github.com/Reza2kn/lance-quant

Quality

Side-by-side on Lance's bundled x2t_image example (6 cases) — outputs match the bf16 baseline to within typical AWQ tolerance. Naïve min-max INT4 produces gibberish ("the loose subs ifa…"); proper AWQ calibration recovers it ("Yes, the largest segment is greater than the sum of all the other segments.").

License

Apache 2.0, inherited from the base model.

v2 update — group_size 128 → 64

Re-quantized with --group_size 64 (was 128 in v1). Same AWQ calibration data, same scale-fusion recipe. Storage: ~6.15 GB (was 6.02 GB); the +2.5% size is the cost of 2× more per-group scales.

Quality jumped substantially on Lance's bundled x2t_image bench:

variant	exact-match	char similarity	difflib ratio	word Jaccard
v1 (group_size=128)	33.3 %	60.4 %	53.7 %	55.3 %
v2 (group_size=64)	50.0 %	69.8 %	62.1 %	66.3 %

The biggest win is on case 4 ("$ spent on promotional events 1998") — v1 hallucinated entities ("Scott Levin and his family") around the correct number; v2 produces the exact baseline output:

"According to the data from the proprietary market research, the total amount spent on the promotional meetings and events during 1998 was approximately $1.3 billion."

The smaller group size reduces the per-group outlier impact in o_proj and down_proj (the linears we can't fuse AWQ scales into), which were responsible for the long-form generation drift.

Recipe & eval at: https://github.com/Reza2kn/lance-quant#v2-group_size-64

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/Lance-3B-AWQ-INT4

Base model

Qwen/Qwen2.5-VL-3B-Instruct

Finetuned

bytedance-research/Lance

Quantized

(15)

this model