Instructions to use petkopetkov/INTACT-pi0-finetune-bridge with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use petkopetkov/INTACT-pi0-finetune-bridge with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModel processor = AutoProcessor.from_pretrained("petkopetkov/INTACT-pi0-finetune-bridge", trust_remote_code=True) model = AutoModel.from_pretrained("petkopetkov/INTACT-pi0-finetune-bridge", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
INTACT-pi0 Bridge V2 (transformers ≥ 4.52 compatible)
A drop-in replacement for juexzz/INTACT-pi0-finetune-bridge
that works directly with current HuggingFace transformers (≥ 4.52, tested on
4.57). Same weights as the upstream INTACT-pi0 fine-tune; the layout has been
remapped to the new PaliGemma structure and the modeling code is vendored
self-contained (no lerobot dependency).
What this is
- Base model: π0 (PaliGemma 3B + Gemma 1B "action expert" + flow-matching head)
- Fine-tune: INTACT-pi0 (AI4CE INT-ACT team) on Bridge V2, 15 epochs / ~22.7k steps / bs=1024 on 4×H100. Checkpoint chosen at epoch 5 (step 7565).
- Action space: Bridge V2 7-D per-step delta
[delta_xyz(3), delta_rpy_sxyz(3), gripper(1)]after the post-processing pipeline (see "Action post-processing" below). - Chunk size: 4 (
n_action_steps = 4).
What changed vs the upstream juexzz repo
- Modeling code is vendored (
configuration_pi0.py,paligemma_with_expert.py,flex_attention.py,modeling_pi0.py). No morepip install lerobot==Xstep. - PR huggingface/lerobot#1297 patches applied to
paligemma_with_expert.pyfor transformers ≥ 4.52 (.language_model.model→.language_model,params_to_change_dtypeselector update). - Stray
from pytest import Cacheremoved frompaligemma_with_expert.py. - State-dict keys re-mapped to the new transformers 4.52+ PaliGemma layout
(
paligemma.language_model.model.X→paligemma.model.language_model.X, etc.). The conversion is documented in this repo's commit history. - PI0Policy wrapper removed. Original
lerobot.common.policies.pi0.PI0PolicywrappedPI0FlowMatchingwithNormalize/PreTrainedPolicy/ dataset-stats plumbing. INTACT was trained withIDENTITYnormalization at the lerobot level and a custom Bridge V2 q01/q99 normalisation done outside the policy, soNormalizewas effectively a no-op. We exposePI0FlowMatchingthrough a thin HFPreTrainedModel(PI0Model) instead. Callers handle Bridge V2 input/ output normalisation themselves.
Bit-identity verified
Numerical parity against the upstream INTACT-pi0 sidecar (lerobot fork + transformers 4.49) was tested on hala A6000 with identical inputs + identical injected noise:
| Component | Max-abs diff |
|---|---|
Raw [delta_x, delta_y, delta_z, delta_roll, delta_pitch, delta_yaw, gripper] (normalised) |
< 1e-4 (effectively zero — 4 decimal places identical) |
Post-processed [xyz_m, axangle, gripper] |
2.2e-3 (bfloat16 noise floor) |
Usage
import torch
from transformers import AutoConfig, AutoModel
device = "cuda" if torch.cuda.is_available() else "cpu"
config = AutoConfig.from_pretrained("petkopetkov/INTACT-pi0-finetune-bridge", trust_remote_code=True)
model = AutoModel.from_pretrained(
"petkopetkov/INTACT-pi0-finetune-bridge",
trust_remote_code=True,
torch_dtype=torch.float32,
).to(device).eval()
Then preprocess your observation as INTACT was trained (Bridge V2 / SIMPLER
top-down frame for state, q01/q99 normalisation from bridge_statistics.json,
image in [-1, 1] float). The forward call:
images = {"observation.images.top": img_bchw} # [B, 3, 224, 224] float in [-1, 1]
state = state_padded # [B, max_state_dim=32], with first 7 dims = Bridge V2 norm
tasks = ["pick up the spoon"] # List[str]
chunk = model.predict_action_chunk(images, state, tasks) # [B, chunk_size=4, max_action_dim=32]
chunk[..., :7] is the 7-D normalised Bridge V2 action chunk. Apply the
denormalise + RPY → axangle pipeline (see bridge_statistics.json and INTACT's
BridgeSimplerAdapter) to get the runner-ready chunk.
Action post-processing pipeline
The raw model output [delta_x, delta_y, delta_z, delta_roll, delta_pitch, delta_yaw, gripper] (in normalised space) is converted to runner-consumable form by:
- Clip first 6 dims to
[-1, 1](the policy's tail extrapolations can pushxyz/rpypast p99 by 8-10×). - Bound-denormalise xyz_rpy using
bridge_statistics.jsonactionp01/p99:x = 0.5 (norm + 1) (p99 - p01) + p01. - Conjugate the RPY rotation back to the bridge frame via the SIMPLER
top-down rotation
R_topdown = [[0, 0, 1], [0, 1, 0], [-1, 0, 0]]:DR_bridge = R_topdown @ DR_topdown @ R_topdown^T. - Convert
DR_bridgeto axis-angle (rotvec). - Gripper passes through unchanged.
Files
| File | Purpose |
|---|---|
config.json |
HF PretrainedConfig (architectures: PI0Model, auto_map → modeling_pi0.PI0Model). |
configuration_pi0.py |
PI0Config (slim, no lerobot). |
paligemma_with_expert.py |
PaliGemma + Gemma-expert dual-stack with shared attention. PR #1297 patches applied. |
flex_attention.py |
Flex-attention path (unused by default; attention_implementation="eager" in config). |
modeling_pi0.py |
PI0FlowMatching (verbatim NN) + PI0Model (HF wrapper). |
model.safetensors |
Re-keyed weights (6.5 GB). |
bridge_statistics.json |
Bridge V2 proprio + action q01/q99 quantiles for pre/post-processing. |
License
Apache-2.0 (inherits from upstream juexzz/INTACT-pi0-finetune-bridge).
Citation / credit
- INTACT-pi0 paper (AI4CE / INT-ACT team)
- π0: "π0: A Vision-Language-Action Flow Model for General Robot Control", Physical Intelligence, 2024.
- Original lerobot port: huggingface/lerobot
- transformers ≥ 4.52 patches: PR #1297
- Downloads last month
- 120