Toponym Context Classifier
DeBERTa-v3-large fine-tuned for discourse context classification of Ukrainian toponym mentions.
Part of #KyivNotKiev.
Performance
| Model | F1 Macro | Accuracy |
|---|---|---|
| DeBERTa-v3-large (this) | 0.857 +/- 0.013 | 0.901 |
| XLM-RoBERTa-large | 0.846 +/- 0.011 | 0.892 |
| mDeBERTa-v3-base | 0.807 +/- 0.007 | 0.864 |
11 context classes: war_conflict, academic_science, history, politics, sports, culture_arts, food_cuisine, travel_tourism, religion, business_economy, general_news.
Training
- Corpus: 36,791 texts, 59 toponym pairs, 5 sources
- Annotation: Claude Haiku 4.5 (validated: kappa=0.56-0.69 vs human consensus 86.2%)
- Loss: class-weighted cross-entropy
- Config: LR=1e-5, epochs=3, batch=16, fp16, seed=456
- Benchmark: 21 runs (12 hyperparameter + 9 architecture comparison)
License
CC-BY-4.0
- Downloads last month
- 13