Psi-Zero base for LeRobot — Unitree G1 humanoid VLA

This repository repackages the published Psi-Zero baseline (Wei et al. 2026 — arXiv:2603.12263) as a LeRobot-loadable snapshot, with the action head expanded to the state / action / chunk dimensions used by the ActGPT Unitree G1 recording schema.

The weights are bit-identical to the upstream baseline up to a key rename and a zero-padding extension of the action-expert projection layers (no new parameter training has happened in this repo).

What this snapshot contains

model.safetensors          merged state dict, ~6 GB
                           keys re-prefixed:
                             vlm_model.*       → model.vlm_model.*
                             <action_header.*> → model.action_header.*
config.json                PsiZeroConfig (max_state_dim=72,
                                          max_action_dim=91,
                                          chunk_size=30,
                                          vlm_model_name='Qwen/Qwen3-VL-2B-Instruct')
train_config.json          copy of config.json (LeRobot resume path)
README.md / LICENSE        this file + Apache-2.0 text

Lineage

Qwen/Qwen3-VL-2B-Instruct                                            (Alibaba; the base VLM)
   ↓ fine-tuned on EgoDex 200k + HE 30k via FAST tokenizer
USC-PSI-Lab/psi-model
   :: psi0/pre.fast.1by1.2601091803.ckpt.ego200k.he30k/             (Stage-1 VLM, 4.3 GB)
   :: psi0/postpre.1by1.pad36.2601131206.ckpt.he30k/                (Stage-2 action expert, 1.9 GB)
   ↓ extend action header   36/36/16  →  72/91/30
   ↓ re-key for LeRobot     vlm_model.*  →  model.vlm_model.*
ActGPT/psi0_base                                                     (this repo)

What is different from the upstream Psi-Zero release

The upstream USC-PSI-Lab/psi-model ships the VLM and the action head as two separate .safetensors files at the dimensions used by the paper's G1 post-training run: odim=36, action_dim=36, action_chunk_size=16. For fine-tuning on the ActGPT Unitree G1 recordings we need to load these weights into a model with larger dimensions:

Dimension	Upstream baseline	This snapshot	What changed
`odim` (state input)	36	72	`obs_proj._obs_proc.1.weight` left-padded zero columns (36 → 72)
`action_dim`	36	91	`action_proj_in.ac_proj.0.{w,b}` zero-padded both axes (36 → 91); `action_proj_in.ac_proj.2.weight` and `action_proj_out.linear.{w,b}` zero-padded the action axis
`action_chunk_size`	16	30	`action_proj_in.dec_pos` xavier-extended on the chunk axis (16 → 30)

The extension is parity-preserving on the first 36 action / state dimensions when chunk size is unchanged (numerically verified in actgpt-library/benchmark/psi0/RESULTS.md). Extending the chunk size changes the action expert's attention context, so the output is no longer identical on overlapping positions — this is expected when adapting to a different chunk length and is fine for fine-tuning, which will tune the freshly-initialised connections from zero.

No further training has been done on these weights — they are the upstream baseline, mechanically extended, ready for fine-tuning on a new task.

How to use (LeRobot fine-tuning)

from actgpt.policies.psi0 import PsiZeroConfig
import actgpt.policies  # registers psi0 with LeRobot's policy factory

policy_config = PsiZeroConfig(
    pretrained_path="ActGPT/psi0_base",
    max_state_dim=72,
    max_action_dim=91,
    chunk_size=30,
    n_action_steps=30,
    freeze_vlm=True,
    gradient_checkpointing=True,
)

Then drive lerobot-train (or the project's training/lerobot/scripts/finetune.py) with this config as usual.

License

Apache-2.0. Same licence as both upstream sources:

USC-PSI-Lab/psi-model (Apache-2.0)
Qwen/Qwen3-VL-2B-Instruct (Apache-2.0)

If you use this snapshot please cite the upstream Psi-Zero paper:

@misc{Wei2026psi0,
  title={$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation},
  author={Songlin Wei and Hongyi Jing and Boqian Li and Zhenyu Zhao and Jiageng Mao and Zhenhao Ni and Sicheng He and Jie Liu and Xiawei Liu and Kaidi Kang and Sheng Zang and Weiduo Yuan and Marco Pavone and Di Huang and Yue Wang},
  year={2026},
  eprint={2603.12263},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
}

And, if the EgoDex prior is relevant to your downstream analysis:

@inproceedings{Hoque2026egodex,
  title={EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video},
  author={Ryan Hoque and Peide Huang and David J. Yoon and Mouli Sivapurapu and Jian Zhang},
  booktitle={ICLR 2026},
  year={2026},
}

Downloads last month: 26

Safetensors

Model size

3B params

Tensor type

F32

BF16

Video Preview

Robotics

Model tree for ActGPT/psi0_base

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

(219)

this model

Paper for ActGPT/psi0_base

Ψ_0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Paper • 2603.12263 • Published Mar 12