second_try / README.md

arashakb

Upload folder using huggingface_hub

5cf9cd8 verified about 1 month ago

preview code

raw

history blame contribute delete

1.8 kB

metadata

license: apache-2.0
tags:
  - pi0.5
  - openpi
  - gguf
  - quantized
  - vla
base_model: zestcode5/ur3-5task-v1

Pi 0.5 UR3 5-task — GGUF (Q8_0 LLM + Q4_K vision + Q5_K embed)

Quantized GGUF export of zestcode5/ur3-5task-v1 (step 6000) for inference with OmniModel.cpp Pi 0.5 C++ runtime.

Quantization

Component	Quant	bpw
Vision (SigLIP)	Q4_K	4.55
Embedding	Q5_K	5.50
PaliGemma LLM	Q8_0	8.50
Action expert	F16	16.0
Total file	mixed	8.47

File size: ~3.6 GB. V+LLM avg bpw: 7.38.

Files

pi05.gguf — unified GGUF (vision + projector + LLM + action expert + embedding + norm stats).
tokenizer.model — SentencePiece tokenizer (PaliGemma).
norm_stats.json — action/state mean/std/q01/q99 from the UR3 5-task dataset.

Tasks

The base model was fine-tuned on zestcode5/ur3-merged-5tasks-v1. Calibration excluded "open the pot by removing its lid". Supported task prompts:

pick up the pink cylinder and place it in the orange box
pick up the white glass and put on a brown coaster
Remove cup from nested cups
Single-finger push to blue marker

Inference

CLI:

./bin/pi05 -m /path/to/this/dir -i frame.png -p "<task prompt>" -d CUDA -s 10

WebSocket policy server (with the modified serve_policy.py):

uv run scripts/serve_policy.py policy:gguf \
    --policy.dir=/path/to/this/dir \
    --policy.device=CUDA --policy.steps=10 --policy.action-dim=7 \
    --port=8000

Robot client uses OpenPI's WebsocketClientPolicy and sends an obs dict with keys observation.images.fixed, observation.images.cam_wrist, observation.state (11-dim), and prompt.