rl-workshop-2026 β Final Policy Checkpoints
Final MAPPO policy checkpoints for the workshop paper "Feedback Attribution
Determines Representation Geometry in Multi-Agent RL." Logged alongside W&B
project tashapais/rl_workshop_2026.
Each .pt is a dict with key "model" (PyTorch state_dict for the
ActorCritic defined in the tribal-village repo's experiments/).
Table 1 β Tribal Village (12 agents, 308 actions), 4M agent-steps
tribal_village/<run>/step_4002816.pt, reward attribution
r_i^alpha = (1-alpha) r_i + alpha * mean_j r_j:
| Condition | alpha | seeds |
|---|---|---|
| Individual | 0.0 | 0,1,2 |
| Mixed | 0.8 | 0,1,2 |
| Shared | 1.0 | 0,1,2 |
Table 2 β SMACv2 10gen_terran (6 terran units), 2M steps
smacv2/<run>/step_2001408.pt:
| Condition | seeds |
|---|---|
| Individual (per-agent reward) | 0,1,2 |
| Shared (team-averaged) | 0,1,2 |
Caveats (see repo runs.md)
- Tribal Village runs fail the behavior gate (no-op/random baseline) at 4M steps under passive shaping; representation geometry from them is direction- consistent (probe declines 0.75β0.50) but not yet behavior-grounded.
- SMAC
individualagents are weak (1.7% win) vs25%); SMAC D_act is mask-contaminated and not paper-quotable without a mask-aware recompute.shared(
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support