arashakb
/

second_try

Model card Files Files and versions

second_try / README.md

arashakb's picture

Upload folder using huggingface_hub

5cf9cd8 verified about 1 month ago

|

history blame contribute delete

1.8 kB

	---
	license: apache-2.0
	tags:
	- pi0.5
	- openpi
	- gguf
	- quantized
	- vla
	base_model: zestcode5/ur3-5task-v1
	---

	# Pi 0.5 UR3 5-task — GGUF (Q8_0 LLM + Q4_K vision + Q5_K embed)

	Quantized GGUF export of `zestcode5/ur3-5task-v1` (step 6000) for inference with
	[OmniModel.cpp](https://github.com/) Pi 0.5 C++ runtime.

	## Quantization

	\| Component \| Quant \| bpw \|
	\|------------------\|---------\|-------\|
	\| Vision (SigLIP) \| Q4_K \| 4.55 \|
	\| Embedding \| Q5_K \| 5.50 \|
	\| PaliGemma LLM \| Q8_0 \| 8.50 \|
	\| Action expert \| F16 \| 16.0 \|
	\| Total file \| mixed \| 8.47 \|

	File size: ~3.6 GB. V+LLM avg bpw: 7.38.

	## Files

	- `pi05.gguf` — unified GGUF (vision + projector + LLM + action expert + embedding + norm stats).
	- `tokenizer.model` — SentencePiece tokenizer (PaliGemma).
	- `norm_stats.json` — action/state mean/std/q01/q99 from the UR3 5-task dataset.

	## Tasks

	The base model was fine-tuned on `zestcode5/ur3-merged-5tasks-v1`. Calibration excluded
	"open the pot by removing its lid". Supported task prompts:

	- `pick up the pink cylinder and place it in the orange box`
	- `pick up the white glass and put on a brown coaster`
	- `Remove cup from nested cups`
	- `Single-finger push to blue marker`

	## Inference

	CLI:

	```
	./bin/pi05 -m /path/to/this/dir -i frame.png -p "<task prompt>" -d CUDA -s 10
	```

	WebSocket policy server (with the modified `serve_policy.py`):

	```
	uv run scripts/serve_policy.py policy:gguf \
	--policy.dir=/path/to/this/dir \
	--policy.device=CUDA --policy.steps=10 --policy.action-dim=7 \
	--port=8000
	```

	Robot client uses OpenPI's `WebsocketClientPolicy` and sends an obs dict with
	keys `observation.images.fixed`, `observation.images.cam_wrist`,
	`observation.state` (11-dim), and `prompt`.