Toponym Context Classifier

DeBERTa-v3-large fine-tuned for discourse context classification of Ukrainian toponym mentions.

Performance

Model	F1 Macro	Accuracy
DeBERTa-v3-large (this)	0.857 +/- 0.013	0.901
XLM-RoBERTa-large	0.846 +/- 0.011	0.892
mDeBERTa-v3-base	0.807 +/- 0.007	0.864

11 context classes: war_conflict, academic_science, history, politics, sports, culture_arts, food_cuisine, travel_tourism, religion, business_economy, general_news.

Training

Corpus: 36,791 texts, 59 toponym pairs, 5 sources
Annotation: Claude Haiku 4.5 (validated: kappa=0.56-0.69 vs human consensus 86.2%)
Loss: class-weighted cross-entropy
Config: LR=1e-5, epochs=3, batch=16, fp16, seed=456
Benchmark: 21 runs (12 hyperparameter + 9 architecture comparison)

License

CC-BY-4.0

Downloads last month: 13

Safetensors

Model size

0.4B params

Tensor type

F32

KyivNotKiev
/

toponym-context-classifier

Toponym Context Classifier

Performance

Training

License

Dataset used to train KyivNotKiev/toponym-context-classifier