BEAT: Behavioral Encoder for Action Trajectories
A foundation transformer model that encodes sequences of human behavioral events into dense, reusable embeddings.
What is BEAT?
Every company predicting churn, recommending products, or segmenting users starts by manually engineering features from behavioral data (RFM scores, click counts, session metrics). This feature engineering is where most prediction quality is lost.
BEAT eliminates that step. Feed it raw event sequences โ page views, purchases, searches, support tickets โ and get a rich 768-dimensional embedding that captures the user's behavioral state.
Key Innovation
Unlike text transformers (BERT, GPT) that encode language, BEAT is designed specifically for action sequences with temporal dynamics:
- Temporal encoding: Learns from time gaps between events (a purchase 1 day after browsing means something different than 30 days after)
- Action vocabulary: Encodes event types, not words
- Behavioral context: Understands that the same action means different things in different sequences
Usage
from transformers import AutoModel
import torch
# Load model
model = AutoModel.from_pretrained("your-org/beat-encoder")
# Encode a behavioral sequence
action_ids = torch.tensor([[1, 2, 3, 5, 1, 6, 2, 5]]) # page_view, product_view, cart, purchase...
property_ids = torch.tensor([[12, 45, 45, 45, 8, 3, 22, 22]]) # category/property context
time_gaps = torch.tensor([[0.0, 0.1, 0.5, 1.2, 3.0, 3.1, 7.0, 7.5]]) # days between events
outputs = model(action_ids, property_ids, time_gaps)
embedding = outputs["embedding"] # [1, 768] โ user behavioral state
Pre-training Objectives
- Masked Event Prediction: Randomly mask 15% of events, predict the action type (like MLM in BERT)
- Next Event Prediction: Given a sequence, predict what action comes next
- Contrastive Learning: Different time windows of the same user should produce similar embeddings
Downstream Tasks
BEAT embeddings can be used for:
| Task | Method | Expected Improvement |
|---|---|---|
| Churn prediction | Linear probe on embedding | +8-15% AUC vs. manual features |
| User segmentation | Cluster embeddings | More stable, interpretable clusters |
| Next-best-action | Fine-tune prediction head | Captures temporal patterns manual features miss |
| Personalization | Nearest-neighbor in embedding space | Real behavioral similarity, not just demographics |
Training Data
Pre-trained on the REES46 e-commerce behavioral dataset (20M+ events from a multi-category online store):
- 50,000 users, 18,401 behavioral sequences
- 10,350 training steps across 10 epochs
- Training loss converged from 0.83 โ 0.42
- Hardware: 2ร NVIDIA T4 GPU (~27 minutes)
The model generalizes to other behavioral domains through fine-tuning.
Architecture
| Parameter | Value |
|---|---|
| Hidden size | 768 |
| Layers | 12 |
| Attention heads | 12 |
| Parameters | 86.4M |
| Embedding output | 768-dim |
| Max sequence length | 256 events |
| Temporal encoding | Learned + sinusoidal (90-day window) |
Paper
๐ BEAT: A Foundation Model for Human Behavioral Sequences
Published on Zenodo โ DOI: 10.5281/zenodo.20774886
Citation
@article{dhanani2026beat,
title = {BEAT: A Foundation Model for Human Behavioral Sequences},
author = {Dhanani, Brijesh},
year = {2026},
doi = {10.5281/zenodo.20774886},
url = {https://doi.org/10.5281/zenodo.20774886},
publisher = {Zenodo}
}
License
Apache 2.0
- Downloads last month
- 57