MIMIC: Melee Imitation Model for Input Cloning

Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays. Eight character-specific ~20M-parameter transformers that take a 180-frame window of game state and output controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz. Each model plays over Slippi Online Direct Connect through Dolphin + libmelee.

Repo: https://github.com/erickfm/MIMIC
Training data: erickfm/melee-ranked-replays — ranked Slippi replays (master/diamond/platinum tier) per character.
Base architecture: Shaw-relative-position causal transformer (d_model=512, 6 layers, 8 heads, seq_len=180). Bootstrapped from HAL (Eric Gu) and since diverged.
Defining MIMIC changes over HAL: 7-class button head with a distinct TRIG class for airdodge/wavedash (HAL's 5-class head can't represent airdodge and thus can't wavedash); v2 shard alignment that fixes a subtle post-frame-gamestate leak in the training targets (see research-notes-2026-04-11c); the digital-L-press fix in decode_and_press (research notes 2026-04-13) without which no 7-class BC bot wavedashes.

Current checkpoints (retrained on 2026-04-20 baseline)

Retrained on the post-schema-drop (13 numeric cols), new-transforms (tanh_scale / linear_max / log_max for velocity / hitlag / hitstun) basis. See research-notes-2026-04-20.md in the MIMIC repo for methodology + results analysis.

Character	Run	Train games	Val loss	Step
Fox	`fox-20260420-baseline`	31,030	0.7144	32768
Falco	`falco-20260420-baseline`	20,882	0.7487	31392
Marth	`marth-20260420-baseline`	11,759	0.6664	31065
Sheik	`sheik-20260420-baseline`	51,751	0.6566	26160
Captain Falcon	`cptfalcon-20260420-baseline`	17,557	0.7368	watchdog
Luigi	`luigi-20260420-baseline`	2,290	0.7460	watchdog

Peach, Jigglypuff, and Ice Climbers remain on pre-2026-04-20 schemas:

peach-20260420-baseline (val 0.6322) was trained on the 22-col schema before the schema drop — loadable via its pickled config.
puff and ice_climbers missed the 2026-04-20 retrain cycle due to a download-script bug; their existing HF checkpoints are on the old schema. These two are incompatible with the current 13-col inference code path. Will be retrained in a follow-on cycle.

Repo layout

MIMIC/
├── README.md                      # this file
├── fox/
│   ├── model.pt                   # raw PyTorch checkpoint
│   ├── config.json                # ModelConfig (copied from ckpt["config"])
│   ├── metadata.json              # provenance (step, val metrics, notes)
│   ├── mimic_norm.json            # per-feature transforms + params
│   ├── controller_combos.json     # 7-class button combo spec
│   ├── cat_maps.json
│   ├── stick_clusters.json
│   └── norm_stats.json            # per-column mean/std (z-score fallback)
├── falco/       (same layout)
├── marth/       (same layout)
├── sheik/       (same layout)
├── cptfalcon/   (same layout)
├── luigi/       (same layout)
├── puff/        (same layout)
├── ice_climbers/(same layout)
└── peach/       (same layout, pre-drop schema — retrain pending)

Each character directory is self-contained — the JSONs are the exact metadata used during training, copied verbatim from the data dir so any inference script can load them without touching the MIMIC repo.

Usage

git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh  # installs Dolphin, deps, ISO

# Download all characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"

Run a character against a level-9 CPU:

python3 tools/play_vs_cpu.py \
  --checkpoint hf_checkpoints/marth/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/marth \
  --character MARTH --cpu-character FOX --cpu-level 9 \
  --stage FINAL_DESTINATION

Or play a bot over Slippi Online Direct Connect:

python3 tools/play_netplay.py \
  --checkpoint hf_checkpoints/sheik/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/sheik \
  --character SHEIK \
  --connect-code YOUR#123

The MIMIC repo also includes a Discord bot frontend (tools/discord_bot.py) that queues direct-connect matches per user. See docs/discord-bot-setup.md.

Architecture

Slippi frame ──► MimicFlatEncoder (Linear 184→512) ──► 512-d per-frame vector
                                                            │
180-frame window ──► + Shaw Relative-Position attention ────┘
                             │
                      6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads, d_ff=2048, GELU, LN)
                             │
                        Autoregressive Output Heads (with detach)
                             │
              ┌──────────────┼───────────────┬────────────┐
          shoulder(3)    c_stick(9)     main_stick(37)  buttons(7)

7-class button head

Class	Meaning
0	A
1	B
2	Z
3	JUMP (X or Y)
4	TRIG (digital L or R)
5	A_TRIG (shield grab)
6	NONE

HAL's original 5-class head (A / B / Jump / Z / None) has no TRIG class and structurally can't execute airdodge, which means HAL-lineage bots can't wavedash. MIMIC's 7-class encoding plus a fix for decode_and_press (which was silently dropping the digital L press until 2026-04-13) is what enables the wavedashing in the replays.

Input features (per frame, per player)

Numeric (13):

pos_x, pos_y, percent, stock, jumps_left,
speed_air_x_self, speed_ground_x_self,
speed_x_attack, speed_y_attack, speed_y_self,
hitlag_left, hitstun_left,
shield_strength

Flags (5):

on_ground, off_stage, facing, invulnerable, moonwalkwarning

Per-feature normalization is defined in each character's mimic_norm.json. The active transforms are:

transform	formula	used for
`normalize`	`2(x-min)/(max-min) - 1` → [-1, +1]	percent, stock, jumps_left, facing, invulnerable, on_ground
`standardize`	`(x - mean) / std`	pos_x, pos_y
`invert_normalize`	`2(max-x)/(max-min) - 1`	shield_strength (so "shield broken" is +1)
`tanh_scale`	`tanh(x / scale)`	5 velocities (scale=5 for self, scale=10 for attack)
`linear_max`	`x / max`	hitlag_left (max=20)
`log_max`	`log1p(clamp(x,0,max)) / log1p(max)`	hitstun_left (max=120)

Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d). Plus the previous-frame controller state as a 56-dim one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder).

Total input per frame: 184 dimensions → projected to 512.

Earlier builds (pre-2026-04-20) used a 22-col numeric schema that included invuln_left and 8 ECB corners. Those columns turned out to be structurally zero for our .slp parse path — libmelee never populates them — so they were dropped from the schema. See research notes 2026-04-20 for the audit. Checkpoints trained pre-drop (peach-20260420-baseline) still load via their own pickled config but use the 202-dim projection path.

Training

Model preset: mimic (20M params)
Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
LR schedule: CosineAnnealingLR to eta_min=1e-6
Gradient clip: 1.0
Dropout: 0.2
Sequence length: 180 frames (~3 seconds)
Batch size: 256 per-GPU × 2 RTX 5090s × grad-accum 1 = eff-batch 512
Mixed precision: BF16 AMP with FP32 upcast for relpos attention (prevents BF16 overflow in the manual Q@Kᵀ + S_rel computation)
Max samples: 16.78M (≈ 32,768 steps at eff-batch 512)
Watchdog: patience=12 evals on val-plateau — some chars finish early
Reaction delay: 0. v2 shards have target[i] = buttons[i+1], so rd=0 matches inference — do NOT use --reaction-delay 1 or --controller-offset with v2 shards.
--self-inputs is required even on v2 shards. Runs without it drop the controller-history input entirely and land at val loss ~2.3.

Typical wall-clock per char on 2×RTX 5090: 10-15 min download/extract

20 min parallel norm_stats bootstrap + 45-120 min sharding (depending on char, cptfalcon and sheik are the longest) + ~50 min training = 2-4 hours.

Known limitations

Character-locked. Each model only plays the character it was trained on. No matchup generalization. Multi-character training with a character embedding is a natural next step but not done.
Small-dataset overfitting on Luigi / Ice Climbers. Luigi has ~2K training games; IC around 5K. Their _bestloss.pt is early-stopped — either by the patience=12 watchdog during this cycle or by inspection in prior cycles. Play quality varies.
Edge guarding and recovery weaknesses. Bots don't consistently go for off-stage edge guards or execute high-skill recovery mixups. The training data has these in it, but BC bots under-sample long-tail strategic decisions.
No Matchmaking / Ranked. The Discord bot only joins explicit Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked — libmelee's README explicitly forbids bots on those ladders, and Slippi has not yet opened a "bot account" opt-in system.

Acknowledgments

Eric Gu for HAL, the reference implementation MIMIC is based on. HAL's architecture, tokenization, and training pipeline are the foundation.
Vlad Firoiu and collaborators for libmelee, the Python interface to Dolphin + Slippi.
Project Slippi for the Slippi Dolphin fork, replay format, and Direct Connect rollback netplay. https://slippi.gg

License

MIT — see the MIMIC repo's LICENSE file.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning