MiniDreamer

MiniDreamer is a PlaNet-style world model project for MiniGrid-FourRooms-v0. It learns a recurrent latent dynamics model from partial RGB observations, predicts reward and episode termination, and uses discrete CEM planning in latent space.

The repository contains:

MiniGrid RGB environment wrappers and bootstrap trajectory collection
Episode-aware replay buffer with reproducible train/val/test splits
CNN encoder, Gaussian RSSM, reward/done heads, optional decoder
Discrete CEM planner with termination-aware return scoring
PPO baseline entrypoint with a MiniGrid-compatible CNN feature extractor
Evaluation code, configs, scripts, tests, and project documentation

A complete baseline training run has been executed. A summary is recorded in results.md, while the frozen baseline artifacts remain gitignored under artifacts/world_model/.

Layout

configs/
docs/
notebooks/
scripts/
src/
tests/

Core code lives under src/minidreamer/, with CLI entrypoints at src/train_world_model.py and src/evaluate.py.

Setup

Use Python 3.11 or 3.12. The project metadata is defined in pyproject.toml.

python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Main Commands

Bootstrap replay collection:

./scripts/collect_random.sh

World-model pipeline:

./scripts/train_world_model.sh

By default, the script writes new experiments to artifacts/world_model_experiment/. To choose a different experiment directory without touching the frozen baseline, set MINIDREAMER_OUTPUT_DIR:

MINIDREAMER_OUTPUT_DIR=artifacts/world_model_restricted_actions ./scripts/train_world_model.sh

Resume an interrupted world-model run from a checkpoint:

python3.11 src/train_world_model.py \
  --config configs/fourrooms_world_model.yaml \
  --output-dir artifacts/world_model \
  --replay-dir artifacts/world_model/replay \
  --resume-checkpoint artifacts/world_model/checkpoints/world_model_env_steps_90021.pt

Planner evaluation from a checkpoint:

./scripts/eval_planner.sh /path/to/checkpoint.pt /path/to/replay

PPO baseline:

./scripts/train_ppo.sh

Notes

The latest completed run summary is in results.md.
The baseline run in artifacts/world_model/ is intentionally frozen as the reference artifact.
New world-model experiments should write to separate directories under artifacts/.
The trainer refuses to overwrite an existing run directory unless you resume with --resume-checkpoint or explicitly pass --allow-overwrite-existing-output.
Metrics, replay snapshots, and checkpoints are intentionally gitignored.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning