Instructions to use daelba/biography2wikidata with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use daelba/biography2wikidata with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="daelba/biography2wikidata")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("daelba/biography2wikidata") model = AutoModelForSeq2SeqLM.from_pretrained("daelba/biography2wikidata") - Notebooks
- Google Colab
- Kaggle
A model for annotating entries in biographical dictionaries using Wikidata entities. Based on Google's mT5.
Example input text:
Anschiringer, Anton, Publizist, * 1812 Wien, † 17. 12. 1873 Reichenberg (Liberec). Erzieher im Hause des Großindustriellen...
Example output text:
{{WD|label|Anschiringer, Anton}}, {{WD|P106|Q6051619|Publizist}}, * {{WD|P569|1812}} {{WD|P19|Q1741|Wien}}, † {{WD|P570|1873-12-17|17. 12. 1873}} {{WD|P20|Q146351|Reichenberg (Liberec)}}. Erzieher im Hause des Großindustriellen...
Evaluation
After training on the dataset of BLGBL, vol. I, the transformer shows a loss value of 0.3878 for this model.
More relevant is the data on how many valid statements the model can obtain from the input. The evaluation test was performed on 100 unseen entries from BLGBL, vol. II.
| Basic statements | Qualifier statements | Total | |
|---|---|---|---|
| Ground truth | 1,209 | 572 | 1,781 |
| Valid statements by the model | 714 | 120 | 834 |
| Accuracy | 0.5906 | 0.2098 | 0.4683 |
| Loss | 0.4094 | 0.7902 | 0.5317 |
In other words, the model correctly retrieves about 60% of the basic statements and 20% of the qualifiers, for a total of 50% of the basic and qualifier statements.
Acknowledgement
The model is the result of a project "Wikimedia versus traditional biographical encyclopedias. Overlaps, gaps, quality and future possibilities" funded by the Wikimedia Research Fund.
Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
- Downloads last month
- 2
Model tree for daelba/biography2wikidata
Base model
google/mt5-small