AR-VLA
Collection
3 items โข Updated
How to use you2who/paligemma-flowmatch-bridge with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("you2who/paligemma-flowmatch-bridge", trust_remote_code=True, dtype="auto")This model was developed by INSAIT and KU Leuven.
Code and model weights are provided under the Gemma license.
This repo provides a fully Transformers-compatible export for the flow-matching (FM) policy.
This export uses native transformers AutoConfig/AutoModel/AutoProcessor wrappers.
It does not require an external databib installation.
reset_test_time_cache / refresh_test_time_vlm / next_test_time_action.generate_action_chunk(...) (or plain forward) to get a full action chunk.import numpy as np
import torch
from PIL import Image
from transformers import AutoModel, AutoProcessor
model_id = "you2who/arboreal-green-raven"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda").eval()
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
image = Image.open("path/to/main_image.png").convert("RGB")
batch = processor.preprocess_inputs(
chat=["pick up the cup", ""],
images={"main": [image]},
ee_pose_translation=np.zeros((1, 1, 3), dtype=np.float32),
ee_pose_rotation=np.array([[[0.0, 0.0, 0.0, 1.0]]], dtype=np.float32),
gripper=np.zeros((1, 1), dtype=np.float32),
joints=np.zeros((1, 1, 7), dtype=np.float32),
dataset_name=np.array(["bridge"]),
inference_mode=True,
)
with torch.inference_mode():
output = model.generate_action_chunk(
input_ids=batch["input_ids"].to("cuda"),
attention_mask=batch["attn_mask"].to("cuda").any(dim=1),
images={k: v.to("cuda") for k, v in batch["images"].items()},
ee_pose_translation=batch["ee_pose_translation"].to("cuda"),
ee_pose_rotation=batch["ee_pose_rotation"].to("cuda"),
gripper=batch["gripper"].unsqueeze(-1).to("cuda"),
joints=batch["joints"].to("cuda"),
control_tokens_ids=batch["control_tokens_ids"],
)
control_plan = processor.postprocess_actions(
model_output=output,
dataset_name=np.array(["bridge"]),
)
print(control_plan.translation_m.shape, control_plan.rotmat.shape, control_plan.gripper_prob.shape)
generate_action_chunk + processor postprocessing