---
license: apache-2.0
tags:
  - pi0.5
  - openpi
  - gguf
  - quantized
  - vla
base_model: zestcode5/ur3-5task-v1
---

# Pi 0.5 UR3 5-task — GGUF (Q8_0 LLM + Q4_K vision + Q5_K embed)

Quantized GGUF export of `zestcode5/ur3-5task-v1` (step 6000) for inference with
[OmniModel.cpp](https://github.com/) Pi 0.5 C++ runtime.

## Quantization

| Component        | Quant   | bpw   |
|------------------|---------|-------|
| Vision (SigLIP)  | Q4_K    | 4.55  |
| Embedding        | Q5_K    | 5.50  |
| PaliGemma LLM    | Q8_0    | 8.50  |
| Action expert    | F16     | 16.0  |
| **Total file**   | mixed   | **8.47** |

File size: ~3.6 GB. V+LLM avg bpw: 7.38.

## Files

- `pi05.gguf` — unified GGUF (vision + projector + LLM + action expert + embedding + norm stats).
- `tokenizer.model` — SentencePiece tokenizer (PaliGemma).
- `norm_stats.json` — action/state mean/std/q01/q99 from the UR3 5-task dataset.

## Tasks

The base model was fine-tuned on `zestcode5/ur3-merged-5tasks-v1`. Calibration excluded
"open the pot by removing its lid". Supported task prompts:

- `pick up the pink cylinder and place it in the orange box`
- `pick up the white glass and put on a brown coaster`
- `Remove cup from nested cups`
- `Single-finger push to blue marker`

## Inference

CLI:

```
./bin/pi05 -m /path/to/this/dir -i frame.png -p "<task prompt>" -d CUDA -s 10
```

WebSocket policy server (with the modified `serve_policy.py`):

```
uv run scripts/serve_policy.py policy:gguf \
    --policy.dir=/path/to/this/dir \
    --policy.device=CUDA --policy.steps=10 --policy.action-dim=7 \
    --port=8000
```

Robot client uses OpenPI's `WebsocketClientPolicy` and sends an obs dict with
keys `observation.images.fixed`, `observation.images.cam_wrist`,
`observation.state` (11-dim), and `prompt`.