File size: 2,442 Bytes
bba62c2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | ---
tags:
- robotics
- vla
- rl-token
---
# pi05-so101-armnetbench-tool-insert-rlt-v50
RL Token (RLT) encoder-decoder trained on the SO101 tool insertion task, on top of the **step_24999** checkpoint from [lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50](https://huggingface.co/lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50).
## What is this?
This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See [Xu et al. (2026), Precise Manipulation with Efficient Online RL](https://www.pi.website/research/rlt) for the method.
## Training
- **Config:** `pi05_rlt_armnetbench_tool_insert`
- **VLA backbone:** `lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50` step_24999 (frozen, `rl_vla_loss_weight=0.0`)
- **Encoder-decoder:** 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
- **Dataset:** `villekuosmanen/armnetbench_tool_insert`
- **Batch size:** 32
- **LR:** 2.5e-5 cosine (1k warmup, 20k decay)
- **Steps:** 20,000
- **Runtime:** ~4h14m on 4x GH200 (Isambard)
No validation split was used — the dataset is too small for a held-out eval split.
## Loss progression (train)
| Step | Train Loss |
|------|-----------|
| 0 | 10754.3 |
| 1,000 | 873.7 |
| 5,000 | 552.0 |
| 10,000 | 430.0 |
| 15,000 | 377.6 |
| 19,900 | 356.4 |
## Checkpoints
| Step | Recommended | Params SHA256 |
|------|-------------|---------------|
| 19999 | ✓ | `d9ddbbbefc07b3700f7df5dd161c14de3291bfcf805f71e39ababb902e1501b2` |
### Verifying checkpoint hashes
```bash
cd checkpoints/19999 && find params -type f | sort | xargs sha256sum | sha256sum
```
## Repo layout
```
assets/ # Norm stats, valid indices
checkpoints/19999/params/ # Step 19999 model weights (recommended)
```
## W&B
Training curves: https://wandb.ai/pravsels/pi05_rlt_armnetbench_tool_insert/runs/jbdqrmu0
## Usage
```python
import openpi.models.model as _model
import openpi.training.config as _config
config = _config.get_config("pi05_rlt_armnetbench_tool_insert")
params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray)
model = config.model.load(params)
```
|