second_try / README.md
arashakb's picture
Upload folder using huggingface_hub
5cf9cd8 verified
metadata
license: apache-2.0
tags:
  - pi0.5
  - openpi
  - gguf
  - quantized
  - vla
base_model: zestcode5/ur3-5task-v1

Pi 0.5 UR3 5-task — GGUF (Q8_0 LLM + Q4_K vision + Q5_K embed)

Quantized GGUF export of zestcode5/ur3-5task-v1 (step 6000) for inference with OmniModel.cpp Pi 0.5 C++ runtime.

Quantization

Component Quant bpw
Vision (SigLIP) Q4_K 4.55
Embedding Q5_K 5.50
PaliGemma LLM Q8_0 8.50
Action expert F16 16.0
Total file mixed 8.47

File size: ~3.6 GB. V+LLM avg bpw: 7.38.

Files

  • pi05.gguf — unified GGUF (vision + projector + LLM + action expert + embedding + norm stats).
  • tokenizer.model — SentencePiece tokenizer (PaliGemma).
  • norm_stats.json — action/state mean/std/q01/q99 from the UR3 5-task dataset.

Tasks

The base model was fine-tuned on zestcode5/ur3-merged-5tasks-v1. Calibration excluded "open the pot by removing its lid". Supported task prompts:

  • pick up the pink cylinder and place it in the orange box
  • pick up the white glass and put on a brown coaster
  • Remove cup from nested cups
  • Single-finger push to blue marker

Inference

CLI:

./bin/pi05 -m /path/to/this/dir -i frame.png -p "<task prompt>" -d CUDA -s 10

WebSocket policy server (with the modified serve_policy.py):

uv run scripts/serve_policy.py policy:gguf \
    --policy.dir=/path/to/this/dir \
    --policy.device=CUDA --policy.steps=10 --policy.action-dim=7 \
    --port=8000

Robot client uses OpenPI's WebsocketClientPolicy and sends an obs dict with keys observation.images.fixed, observation.images.cam_wrist, observation.state (11-dim), and prompt.