MiniDreamer
MiniDreamer is a PlaNet-style world model project for MiniGrid-FourRooms-v0. It learns a recurrent latent dynamics model from partial RGB observations, predicts reward and episode termination, and uses discrete CEM planning in latent space.
The repository contains:
- MiniGrid RGB environment wrappers and bootstrap trajectory collection
- Episode-aware replay buffer with reproducible train/val/test splits
- CNN encoder, Gaussian RSSM, reward/done heads, optional decoder
- Discrete CEM planner with termination-aware return scoring
- PPO baseline entrypoint with a MiniGrid-compatible CNN feature extractor
- Evaluation code, configs, scripts, tests, and project documentation
A complete baseline training run has been executed. A summary is recorded in results.md, while the frozen baseline artifacts remain gitignored under artifacts/world_model/.
Layout
configs/
docs/
notebooks/
scripts/
src/
tests/
Core code lives under src/minidreamer/, with CLI entrypoints at src/train_world_model.py and src/evaluate.py.
Setup
Use Python 3.11 or 3.12. The project metadata is defined in pyproject.toml.
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Main Commands
Bootstrap replay collection:
./scripts/collect_random.sh
World-model pipeline:
./scripts/train_world_model.sh
By default, the script writes new experiments to artifacts/world_model_experiment/. To choose a different experiment directory without touching the frozen baseline, set MINIDREAMER_OUTPUT_DIR:
MINIDREAMER_OUTPUT_DIR=artifacts/world_model_restricted_actions ./scripts/train_world_model.sh
Resume an interrupted world-model run from a checkpoint:
python3.11 src/train_world_model.py \
--config configs/fourrooms_world_model.yaml \
--output-dir artifacts/world_model \
--replay-dir artifacts/world_model/replay \
--resume-checkpoint artifacts/world_model/checkpoints/world_model_env_steps_90021.pt
Planner evaluation from a checkpoint:
./scripts/eval_planner.sh /path/to/checkpoint.pt /path/to/replay
PPO baseline:
./scripts/train_ppo.sh
Notes
- The latest completed run summary is in results.md.
- The baseline run in
artifacts/world_model/is intentionally frozen as the reference artifact. - New world-model experiments should write to separate directories under
artifacts/. - The trainer refuses to overwrite an existing run directory unless you resume with
--resume-checkpointor explicitly pass--allow-overwrite-existing-output. - Metrics, replay snapshots, and checkpoints are intentionally gitignored.