| --- |
| license: cc-by-4.0 |
| language: |
| - hu |
| metrics: |
| - accuracy |
| model-index: |
| - name: huBERTPlain |
| results: |
| - task: |
| type: text-classification |
| metrics: |
| - type: f1 |
| value: 0.91 |
| widget: |
| - text: "A veget\xE1ci\xF3s id\u0151ben az orsz\xE1gban rendszeresen jelentkez\u0151\ |
| \ j\xE9ges\u0151k ellen is van m\xF3d v\xE9dekezni lok\xE1lisan, ki-ki a saj\xE1\ |
| t nagy \xE9rt\xE9k\u0171 \xFCltetv\xE9ny\xE9n." |
| example_title: Positive |
| - text: "Magyarorsz\xE1g t\xF6bb \xE9vtizede k\xFCzd demogr\xE1fiai v\xE1ls\xE1ggal,\ |
| \ \xE9s egyre t\xF6bb gyermekre v\xE1gy\xF3 p\xE1r medd\u0151s\xE9gi probl\xE9\ |
| m\xE1kkal n\xE9z szembe." |
| exmaple_title: Negative |
| - text: "Tisztelt fideszes, KDNP-s K\xE9pvisel\u0151t\xE1rsaim!" |
| example_title: Neutral |
| extra_gated_fields: |
| Country: country |
| Institution: text |
| Institution Email: text |
| Full Name: text |
| Please specify your academic project/use case you want to use the models for: text |
| extra_gated_prompt: Our models are intended for academic projects and academic research |
| only. If you are not affiliated with an academic institution, please reach out to |
| us at huggingface [at] poltextlab [dot] com for further inquiry. If we cannot clearly |
| determine your academic affiliation and use case based on your form data, your request |
| may be rejected. Please allow us a few business days to manually review subscriptions. |
| --- |
| |
| ## Model description |
|
|
| Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`. |
|
|
| ## Intended uses & limitations |
|
|
| The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where: |
| * 'Label_0': Neutral |
| * 'Label_1': Positive |
| * 'Label_2': Negative |
| |
| ## Training |
| |
| The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus. |
| |
| | Category | Count | Ratio | Sentiment | Count | Ratio | |
| | -------- | ----- | ------ | --------- | ----- | ------ | |
| | Neutral | 351 | 1.85% | Neutral | 351 | 1.85% | |
| | Fear | 162 | 0.85% | Negative | 11180 | 58.84% | |
| | Sadness | 4258 | 22.41% | |
| | Anger | 643 | 3.38% | |
| | Disgust | 6117 | 32.19% | |
| | Success | 6602 | 34.74% | Positive | 7471 | 39.32% | |
| | Joy | 441 | 2.32% | |
| | Trust | 428 | 2.25% | |
| | Sum | 19002 | | | | | |
| |
| ## Eval results |
| |
| | Class | Precision | Recall | F-Score | |
| |-----|------------|------------|------| |
| |Neutral|0.83|0.71|0.76| |
| |Positive|0.87|0.91|0.9| |
| |Negative|0.94|0.91|0.93| |
| |Macro AVG|0.88|0.85|0.86| |
| |Weighted WVG|0.91|0.91|0.91| |
| |
| |
| ## Usage |
| |
| ```py |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3") |
| model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3") |
| ``` |
| |
| ### BibTeX entry and citation info |
| |
| If you use the model, please cite the following paper: |
| |
| Bibtex: |
| ```bibtex |
| @ARTICLE{10149341, |
| author={{"U}veges, Istv{\'a}n and Ring, Orsolya}, |
| journal={IEEE Access}, |
| title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, |
| year={2023}, |
| volume={11}, |
| number={}, |
| pages={60267-60278}, |
| doi={10.1109/ACCESS.2023.3285536} |
| } |
| ``` |