rl-workshop-2026 — Final Policy Checkpoints

Final MAPPO policy checkpoints for the workshop paper "Feedback Attribution Determines Representation Geometry in Multi-Agent RL." Logged alongside W&B project tashapais/rl_workshop_2026.

Each .pt is a dict with key "model" (PyTorch state_dict for the ActorCritic defined in the tribal-village repo's experiments/).

Table 1 — Tribal Village (12 agents, 308 actions), 4M agent-steps

tribal_village/<run>/step_4002816.pt, reward attribution r_i^alpha = (1-alpha) r_i + alpha * mean_j r_j:

Condition	alpha	seeds
Individual	0.0	0,1,2
Mixed	0.8	0,1,2
Shared	1.0	0,1,2

Table 2 — SMACv2 10gen_terran (6 terran units), 2M steps

smacv2/<run>/step_2001408.pt:

Condition	seeds
Individual (per-agent reward)	0,1,2
Shared (team-averaged)	0,1,2

Caveats (see repo `runs.md`)

Tribal Village runs fail the behavior gate (no-op/random baseline) at 4M steps under passive shaping; representation geometry from them is direction- consistent (probe declines 0.75→0.50) but not yet behavior-grounded.
SMAC individual agents are weak (~~1.7% win) vs shared (~~25%); SMAC D_act is mask-contaminated and not paper-quotable without a mask-aware recompute.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

rl-workshop-2026 — Final Policy Checkpoints

Table 1 — Tribal Village (12 agents, 308 actions), 4M agent-steps

Table 2 — SMACv2 10gen_terran (6 terran units), 2M steps

Caveats (see repo runs.md)

Caveats (see repo `runs.md`)