Voidly Atlas Classifier v3.3
Version: v3.3 | Trained: 2026-05-21T03:01:46.793987+00:00 | License: CC BY 4.0
Country-day censorship classifier with regime-similarity-weighted geographic contagion features. Promoted 2026-05-21.
Intended use
Given a (country, day) feature vector β measurement volume, anomaly rate,
probe agreement, and three contagion neighbour aggregates β return a calibrated
P(censorship event). Surfaced live at
GET https://api.voidly.ai/v1/classifier/score/{cc}.
Evaluation
| Metric | Value |
|---|---|
stratified_f1 |
0.7289 |
stratified_auc |
0.8991 |
loco_median_f1 |
0.8696 |
loco_mean_f1 |
0.7109 |
loco_n_countries |
127 |
n_features |
16 |
n_samples |
4237 |
n_positive |
1116 |
n_countries |
131 |
Stratified split = single random 80/20. LOCO = leave-one-country-out across 127 countries, the honest cross-country generalization number.
Training data
Total samples: 4,237 country-day rows
Positives: 1,116 (26.3%)
Unique countries: 131
Provenance: OONI + CensoredPlanet + IODA + Voidly probe network (84K evidence rows)
Labels exclude IODA
disruptionincidents (fix 2026-05-21) β those are real network outages but not all are censorship
Features
16 inputs: 13 base + 3 regime-similarity-weighted contagion neighbours.
anomaly_ratemeasurement_countspike_magnitudeday_of_weekmonthis_weekendrate_count_interactionprobe_block_rateprobe_node_countprobe_avg_confidenceprobe_agreementrate_spike_interactionhigh_evidenceneighbor_block_rate_7dneighbor_incident_count_7dneighbor_max_anomaly_7d
Top features by gain
anomaly_rate(importance 0.2210)month(importance 0.2022)measurement_count(importance 0.1696)neighbor_max_anomaly_7d(importance 0.0887)neighbor_incident_count_7d(importance 0.0770)neighbor_block_rate_7d(importance 0.0764)rate_count_interaction(importance 0.0561)day_of_week(importance 0.0389)spike_magnitude(importance 0.0354)rate_spike_interaction(importance 0.0319)is_weekend(importance 0.0013)probe_avg_confidence(importance 0.0009)high_evidence(importance 0.0003)probe_block_rate(importance 0.0002)probe_node_count(importance 0.0001)probe_agreement(importance 0.0000)
Honest caveats
- LOCO mean F1 0.71 is the honest generalization number β the median 0.87 hides a long tail.
- 16 MENA + post-Soviet countries regress 5β29pp (OM, UZ, TN, LY, YE, JO, MA, β¦) due to sparse neighbor-pair overlap.
- v3.4 regime-cluster finetune was a negative result β coefficient analysis showed the stack head ignored the cluster heads. Root cause is noise-bounded F1 in countries with 5β15 positive samples; the real fix is targeted labeling, not architecture.
- EG (Egypt) recovered from a v3.2 regression (F1 0.55 β 0.73).
- Predicted probabilities are NOT isotonic-calibrated; the API surfaces per-country thresholds.
Reproducibility
# Build script (training data β fitted pickle + per-country thresholds)
python3 scripts/build-classifier-v3.3-regime-weighted.py
Algorithm:
GradientBoostingClassifier(sklearn)Promoted artifact:
/opt/voidly-ai/models/censorship_classifier_v3_promoted.pklBackup pre-promote:
.pkl.bak.v3.1-2026-05-21Per-country thresholds:
ml-deploy/classifier_v3.3_per_country_thresholds.json
Citation
@misc{voidly_voidly_classifier_v3.3,
title = {Voidly Atlas: voidly-classifier-v3.3 (v3.3)},
author = {Voidly},
year = {2026},
url = {https://huggingface.co/emperor-mew/voidly-classifier-v3.3},
note = {Open censorship-research ML stack. CC BY 4.0.}
}