second_try / README.md
arashakb's picture
Upload folder using huggingface_hub
5cf9cd8 verified
---
license: apache-2.0
tags:
- pi0.5
- openpi
- gguf
- quantized
- vla
base_model: zestcode5/ur3-5task-v1
---
# Pi 0.5 UR3 5-task — GGUF (Q8_0 LLM + Q4_K vision + Q5_K embed)
Quantized GGUF export of `zestcode5/ur3-5task-v1` (step 6000) for inference with
[OmniModel.cpp](https://github.com/) Pi 0.5 C++ runtime.
## Quantization
| Component | Quant | bpw |
|------------------|---------|-------|
| Vision (SigLIP) | Q4_K | 4.55 |
| Embedding | Q5_K | 5.50 |
| PaliGemma LLM | Q8_0 | 8.50 |
| Action expert | F16 | 16.0 |
| **Total file** | mixed | **8.47** |
File size: ~3.6 GB. V+LLM avg bpw: 7.38.
## Files
- `pi05.gguf` — unified GGUF (vision + projector + LLM + action expert + embedding + norm stats).
- `tokenizer.model` — SentencePiece tokenizer (PaliGemma).
- `norm_stats.json` — action/state mean/std/q01/q99 from the UR3 5-task dataset.
## Tasks
The base model was fine-tuned on `zestcode5/ur3-merged-5tasks-v1`. Calibration excluded
"open the pot by removing its lid". Supported task prompts:
- `pick up the pink cylinder and place it in the orange box`
- `pick up the white glass and put on a brown coaster`
- `Remove cup from nested cups`
- `Single-finger push to blue marker`
## Inference
CLI:
```
./bin/pi05 -m /path/to/this/dir -i frame.png -p "<task prompt>" -d CUDA -s 10
```
WebSocket policy server (with the modified `serve_policy.py`):
```
uv run scripts/serve_policy.py policy:gguf \
--policy.dir=/path/to/this/dir \
--policy.device=CUDA --policy.steps=10 --policy.action-dim=7 \
--port=8000
```
Robot client uses OpenPI's `WebsocketClientPolicy` and sends an obs dict with
keys `observation.images.fixed`, `observation.images.cam_wrist`,
`observation.state` (11-dim), and `prompt`.