kinder-mbrl β Learned World Models for KinDER
Trained world-model checkpoints for the KinDER physical-reasoning benchmark (RSS 2026).
These checkpoints are produced by the kinder-mbrl baseline repository and are intended to be used with its random-shooting MPC planner.
Model architecture
Each checkpoint (.pt) contains a two-head MLPDynamics network together
with all normalizer statistics needed for inference.
Network topology
input: [s_t (state_dim) | a_t (action_dim)]
β
ββββββββΌβββββββ
β Linear β SiLU β hidden_dim = 256
β Linear β SiLU β hidden_dim = 256
ββββββββ¬βββββββ
β shared trunk
βββββββ΄βββββββ
βΌ βΌ
robot_head env_head
(robot_dim) (env_dim)
delta_robot delta_env
Prediction rule:
delta_robot, delta_env = model(normalize(s_t), normalize(a_t))
s_{t+1} = s_t + concat(denormalize(delta_robot), denormalize(delta_env))
The robot-state and environment-state deltas are predicted by two independent linear heads sharing the same trunk. Each head has its own per-feature zero-mean unit-variance normalizer fitted on the training dataset, so the two dynamics regimes (actuator / kinematic changes vs. object / scene changes) are learned independently.
Normalizers
Four independent normalizers are stored in each checkpoint:
| Key | Applied to |
|---|---|
s_norm |
full state vector |
a_norm |
action vector |
dr_norm |
robot-state delta |
de_norm |
environment-state delta |
Checkpoint format
Each .pt file is a torch.save dict with the following keys:
| Key | Type | Description |
|---|---|---|
model_state |
OrderedDict |
PyTorch state_dict for MLPDynamics |
state_dim |
int |
Full state dimension (robot_dim + env_dim) |
action_dim |
int |
Action dimension |
robot_dim |
int |
Robot-state sub-dimension |
env_dim |
int |
Environment-state sub-dimension |
s_norm |
dict with mean, std |
State normalizer statistics |
a_norm |
dict with mean, std |
Action normalizer statistics |
dr_norm |
dict with mean, std |
Robot-delta normalizer statistics |
de_norm |
dict with mean, std |
Env-delta normalizer statistics |
Training
Models were trained from HDF5 demonstration datasets available at kinder-bench/kinder-datasets.
# Install the training package
git clone https://github.com/Princeton-Robot-Planning-and-Learning/kinder-baselines
cd kinder-baselines/kinder-mbrl
uv pip install -e ".[develop]"
# Train (1 000 epochs, batch 512, lr 1e-3 by default)
python experiments/train_world_model.py \
--mode train \
--hdf5_path /path/to/dataset.hdf5 \
--output_dir output \
--epochs 1000
# Evaluate open-loop rollout error
python experiments/train_world_model.py \
--mode eval \
--hdf5_path /path/to/dataset.hdf5 \
--checkpoint output/wm.pt
Default hyperparameters:
| Hyperparameter | Value |
|---|---|
| Hidden dim | 256 |
| Activation | SiLU |
| Optimizer | Adam |
| Learning rate | 1e-3 |
| Batch size | 512 |
| Epochs | 1 000 |
| Loss | MSE (robot head) + MSE (env head) |
Inference / MPC usage
import torch
from kinder_mbrl.planning import load_world_model, wm_get_next_state
# Load checkpoint
model, norms = load_world_model("path/to/wm.pt")
# One-step prediction
next_state = wm_get_next_state(current_state, action, model, norms)
To run the full random-shooting MPC loop:
# World-model-based planning
python experiments/run_mpc.py \
--use_world_model \
--checkpoint output/wm.pt \
--num_candidates 50 \
--horizon 5
Citation
If you use these datasets, please cite the paper: KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning:
@inproceedings{huang2026kinder,
title = {KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning},
author = {Huang, Yixuan and Li, Bowen and Saxena, Vaibhav and Liang, Yichao and Mishra, Utkarsh and Ji, Liang and Zha, Lihan and Wu, Jimmy and Kumar, Nishanth and Scherer, Sebastian and Xu, Danfei and Silver, Tom},
booktitle = {Robotics: Science and Systems (RSS)},
year = {2026}
}