VisionEncoder

Hosted artifacts (derived data + trained checkpoints) for the VisionEncoder research project.

Training code + full reproduction guide: https://github.com/xiaomoguhz/VisionEncoder

The repo is organized into three top-level folders.

`data/` — current (V9.x) reproduction data (~6.5G)

Path	Content
`data/vmllm_cached/qwen3vit/`	S2 `cached_dataset` arrow (image/video, 10pct + full); fed directly to stage-2
`data/ms-swift-data/`	sampled sharegpt jsonl (10pct + full)
`data/llava_video/`	V9 decode-probed `good_manifest` for the video path

`ckpts/` — ready-made 4B MLLM inference weights

Path	Content
`ckpts/4b_stock`	4B stock baseline (raw Qwen3.5 ViT, skips declip), checkpoint-505, 9.5G
`ckpts/4b_v9_1`	4B V9.1 (V-JEPA 2.1 video self-distill), checkpoint-505, 9.5G

Download either and feed it straight to evaluation (see the GitHub README, section 4 — MLLM evaluation) to skip declip + S1 + S2.

`legacy/` — historical assets (~368G)

Early-line products, not needed to reproduce the current main line: declip_siglip2/spatial_align, kd_mllm, self_refine, video_mllm_swift (old SigLIP2 / image-only S1+S2 ckpts), and old ViT-family arrow caches.

Download

# current dev data
huggingface-cli download xiaomoguhzz/VisionEncoder --include "data/*" --local-dir .
# ready-made 4B MLLM ckpt (eval directly)
huggingface-cli download xiaomoguhzz/VisionEncoder --include "ckpts/4b_v9_1/*" --local-dir .

Code + reproduction guide: https://github.com/xiaomoguhz/VisionEncoder

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

xiaomoguhzz
/

VisionEncoder

VisionEncoder

`data/` — current (V9.x) reproduction data (~6.5G)

`ckpts/` — ready-made 4B MLLM inference weights

`legacy/` — historical assets (~368G)

Download

Related

VisionEncoder

data/ — current (V9.x) reproduction data (~6.5G)

ckpts/ — ready-made 4B MLLM inference weights

legacy/ — historical assets (~368G)

Download

Related

`data/` — current (V9.x) reproduction data (~6.5G)

`ckpts/` — ready-made 4B MLLM inference weights

`legacy/` — historical assets (~368G)