Emergent Compositional Communication for Latent World Properties
Paper • 2604.03266 • Published • 7
Tomasz Kaszyński, 2026
Neural agents with different vision backbones develop shared compositional languages about physical properties through a discrete Gumbel-Softmax bottleneck. Each message position self-organizes to encode a specific physical property (elasticity, friction). The protocol achieves 91.5% accuracy on unseen collision outcomes and 85.6% on real camera footage (Physics 101 dataset).
CompositionalSender — the core trainable module:
TemporalEncoder:
Conv1d(384 → 256, k=3) → ReLU
Conv1d(256 → 128, k=3) → ReLU
AdaptiveAvgPool1d(1)
Linear(128 → 128) → ReLU
Message Heads (×2):
Linear(128 → 8) → Gumbel-Softmax(τ=1.0)
Output: 2 discrete tokens per agent, each ∈ {0, ..., 7}
| File | Description | Size |
|---|---|---|
phase54b_model.pt |
Main result model (DINOv2 features, 2-agent, 2×8 bottleneck) | 3.5 MB |
phase54c_model.pt |
Best multi-seed variant | 3.5 MB |
phase54c_seed42_model.pt |
Seed 42 | 3.3 MB |
phase54c_seed123_model.pt |
Seed 123 | 3.3 MB |
phase54c_seed456_model.pt |
Seed 456 | 3.3 MB |
phase54c_seed789_model.pt |
Seed 789 | 3.3 MB |
phase54c_seed1337_model.pt |
Seed 1337 | 3.3 MB |
phase87_phys101_spring_features.pt |
Pre-extracted DINOv2 features for Physics 101 spring (206 clips) | 3.2 MB |
Each .pt file is a dictionary with keys:
{
"sender_2x8": <state_dict>, # CompositionalSender weights
"receiver_2x8": <state_dict>, # CompositionalReceiver weights
"sender_1x64": <state_dict>, # Alternative 1×64 bottleneck sender
"receiver_1x64": <state_dict>, # Alternative 1×64 receiver
}
import torch
import torch.nn as nn
import torch.nn.functional as F
class TemporalEncoder(nn.Module):
def __init__(self, hidden_dim=128, input_dim=384, n_frames=4):
super().__init__()
ks = min(3, n_frames)
self.temporal = nn.Sequential(
nn.Conv1d(input_dim, 256, kernel_size=ks, padding=ks // 2), nn.ReLU(),
nn.Conv1d(256, 128, kernel_size=ks, padding=ks // 2), nn.ReLU(),
nn.AdaptiveAvgPool1d(1))
self.fc = nn.Sequential(nn.Linear(128, hidden_dim), nn.ReLU())
def forward(self, x):
return self.fc(self.temporal(x.permute(0, 2, 1)).squeeze(-1))
class CompositionalSender(nn.Module):
def __init__(self, hidden_dim=128, input_dim=384, vocab_size=8, n_heads=2):
super().__init__()
self.encoder = TemporalEncoder(hidden_dim, input_dim)
self.vocab_size = vocab_size
self.heads = nn.ModuleList([nn.Linear(hidden_dim, vocab_size) for _ in range(n_heads)])
def forward(self, x, tau=1.0):
h = self.encoder(x)
tokens = [head(h).argmax(dim=-1) for head in self.heads]
return torch.stack(tokens, dim=-1) # [batch, n_heads]
# Load
ckpt = torch.load("phase54c_model.pt", map_location="cpu")
sender = CompositionalSender(hidden_dim=128, input_dim=384, vocab_size=8, n_heads=2)
sender.load_state_dict(ckpt["sender_2x8"])
sender.eval()
# Run on DINOv2 features: [batch, n_frames, 384]
features = torch.randn(1, 4, 384) # Replace with real DINOv2 features
tokens = sender(features)
print(f"Discrete physics code: {tokens}") # e.g., tensor([[3, 7]])
@article{kaszynski2026emergent,
title={Emergent Compositional Communication for Latent World Properties},
author={Kaszy{\'n}ski, Tomasz},
journal={arXiv preprint arXiv:2604.03266},
year={2026}
}