rl-workshop-2026 β€” Final Policy Checkpoints

Final MAPPO policy checkpoints for the workshop paper "Feedback Attribution Determines Representation Geometry in Multi-Agent RL." Logged alongside W&B project tashapais/rl_workshop_2026.

Each .pt is a dict with key "model" (PyTorch state_dict for the ActorCritic defined in the tribal-village repo's experiments/).

Table 1 β€” Tribal Village (12 agents, 308 actions), 4M agent-steps

tribal_village/<run>/step_4002816.pt, reward attribution r_i^alpha = (1-alpha) r_i + alpha * mean_j r_j:

Condition alpha seeds
Individual 0.0 0,1,2
Mixed 0.8 0,1,2
Shared 1.0 0,1,2

Table 2 β€” SMACv2 10gen_terran (6 terran units), 2M steps

smacv2/<run>/step_2001408.pt:

Condition seeds
Individual (per-agent reward) 0,1,2
Shared (team-averaged) 0,1,2

Caveats (see repo runs.md)

  • Tribal Village runs fail the behavior gate (no-op/random baseline) at 4M steps under passive shaping; representation geometry from them is direction- consistent (probe declines 0.75β†’0.50) but not yet behavior-grounded.
  • SMAC individual agents are weak (1.7% win) vs shared (25%); SMAC D_act is mask-contaminated and not paper-quotable without a mask-aware recompute.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support