Lance-3B AWQ INT4 (image checkpoint)

4-bit AWQ-quantized variant of bytedance-research/Lance, the Lance_3B image-focused checkpoint (text-to-image, image edit, image understanding).

File-size reduction: 24.7 GB β†’ ~6 GB (4Γ—) Inference VRAM (LLM only, bf16 activations): ~13 GB β†’ ~6 GB

What's different from the video sibling

Lance ships two checkpoints:

  • Lance_3B β€” image-focused (this one). 24.7 GB F32 source. No bundled ViT, smaller latent_pos_embed (image grid only).
  • Lance_3B_Video β€” video-focused. 28.4 GB F32 source. Bundles the Qwen2.5-VL ViT in its safetensors + larger video-grid latent_pos_embed. Quantized variant: Reza2kn/Lance-3B-Video-AWQ-INT4.

This image checkpoint relies on the standalone Qwen2.5-VL-ViT for vision encoding (also bundled in the official Lance HF repo; not redistributed here).

What was quantized

Same MoT-aware scheme as the video sibling β€” 504 Linear modules in language_model.* (252 understanding-path + 252 generation-expert _moe_gen variants), 360 with AWQ scale fusion into the preceding RMSNorm, 144 with plain per-group min-max (o_proj, down_proj). The ViT, projection layers, time embedder, latent positional embeds, and lm_head are kept in bf16.

See the video sibling README for the full per-component table β€” it's identical here.

Calibration

  • x2t_image (Lance's 6-sample example set, full 30 timesteps) β†’ 252 und-path linears, 85.3 M tokens of activation data
  • t2i (Lance's 11-sample example set, 2 denoising timesteps) β†’ all 504 linears (both und and gen paths)
  • Merged: 252 und + 252 gen Linears all with activation data

File layout

Lance_3B-AWQ-INT4/
β”œβ”€β”€ awq_state_dict.safetensors   # ~6 GB: packed INT4 + bf16 pass-through
β”œβ”€β”€ awq_meta.json                # per-weight scheme + group_size + shape
└── README.md

Storage layout per quantized linear is identical to the video sibling β€” see that repo for the qweight / scales / zeros byte layout.

How to use

Same as the video sibling. The Lance source ships a custom Lance PreTrainedModel (in github.com/bytedance/Lance). Use the runtime swap-in approach: build Lance normally, then replace nn.Linear modules in language_model.* with the WQLinearINT4 reference module and stream the AWQ buffers in.

A complete reproduction (calibration scripts + WQLinearINT4 + run_quant_eval.py) is at: https://github.com/Reza2kn/lance-quant

Quality

Side-by-side on Lance's bundled x2t_image example (6 cases) β€” outputs match the bf16 baseline to within typical AWQ tolerance. NaΓ―ve min-max INT4 produces gibberish ("the loose subs ifa…"); proper AWQ calibration recovers it ("Yes, the largest segment is greater than the sum of all the other segments.").

License

Apache 2.0, inherited from the base model.

v2 update β€” group_size 128 β†’ 64

Re-quantized with --group_size 64 (was 128 in v1). Same AWQ calibration data, same scale-fusion recipe. Storage: ~6.15 GB (was 6.02 GB); the +2.5% size is the cost of 2Γ— more per-group scales.

Quality jumped substantially on Lance's bundled x2t_image bench:

variant exact-match char similarity difflib ratio word Jaccard
v1 (group_size=128) 33.3 % 60.4 % 53.7 % 55.3 %
v2 (group_size=64) 50.0 % 69.8 % 62.1 % 66.3 %

The biggest win is on case 4 ("$ spent on promotional events 1998") β€” v1 hallucinated entities ("Scott Levin and his family") around the correct number; v2 produces the exact baseline output:

"According to the data from the proprietary market research, the total amount spent on the promotional meetings and events during 1998 was approximately $1.3 billion."

The smaller group size reduces the per-group outlier impact in o_proj and down_proj (the linears we can't fuse AWQ scales into), which were responsible for the long-form generation drift.

Recipe & eval at: https://github.com/Reza2kn/lance-quant#v2-group_size-64

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Reza2kn/Lance-3B-AWQ-INT4

Quantized
(15)
this model