poltextlab
/

HunEmBERT3

Text Classification

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

Metrics Training metrics Community

HunEmBERT3 / README.md

poltextlab's picture

Update gated prompt

6e6ec07 verified 6 days ago

|

history blame contribute delete

3.37 kB

	---
	license: cc-by-4.0
	language:
	- hu
	metrics:
	- accuracy
	model-index:
	- name: huBERTPlain
	results:
	- task:
	type: text-classification
	metrics:
	- type: f1
	value: 0.91
	widget:
	- text: "A veget\xE1ci\xF3s id\u0151ben az orsz\xE1gban rendszeresen jelentkez\u0151\
	\ j\xE9ges\u0151k ellen is van m\xF3d v\xE9dekezni lok\xE1lisan, ki-ki a saj\xE1\
	t nagy \xE9rt\xE9k\u0171 \xFCltetv\xE9ny\xE9n."
	example_title: Positive
	- text: "Magyarorsz\xE1g t\xF6bb \xE9vtizede k\xFCzd demogr\xE1fiai v\xE1ls\xE1ggal,\
	\ \xE9s egyre t\xF6bb gyermekre v\xE1gy\xF3 p\xE1r medd\u0151s\xE9gi probl\xE9\
	m\xE1kkal n\xE9z szembe."
	exmaple_title: Negative
	- text: "Tisztelt fideszes, KDNP-s K\xE9pvisel\u0151t\xE1rsaim!"
	example_title: Neutral
	extra_gated_fields:
	Country: country
	Institution: text
	Institution Email: text
	Full Name: text
	Please specify your academic project/use case you want to use the models for: text
	extra_gated_prompt: Our models are intended for academic projects and academic research
	only. If you are not affiliated with an academic institution, please reach out to
	us at huggingface [at] poltextlab [dot] com for further inquiry. If we cannot clearly
	determine your academic affiliation and use case based on your form data, your request
	may be rejected. Please allow us a few business days to manually review subscriptions.
	---

	## Model description

	Cased fine-tuned BERT model for Hungarian, trained on (manually annotated) parliamentary pre-agenda speeches scraped from `parlament.hu`.

	## Intended uses & limitations

	The model can be used as any other (cased) BERT model. It has been tested recognizing positive, negative, and neutral sentences in (parliamentary) pre-agenda speeches, where:
	* 'Label_0': Neutral
	* 'Label_1': Positive
	* 'Label_2': Negative

	## Training

	The fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.

	\| Category \| Count \| Ratio \| Sentiment \| Count \| Ratio \|
	\| -------- \| ----- \| ------ \| --------- \| ----- \| ------ \|
	\| Neutral \| 351 \| 1.85% \| Neutral \| 351 \| 1.85% \|
	\| Fear \| 162 \| 0.85% \| Negative \| 11180 \| 58.84% \|
	\| Sadness \| 4258 \| 22.41% \|
	\| Anger \| 643 \| 3.38% \|
	\| Disgust \| 6117 \| 32.19% \|
	\| Success \| 6602 \| 34.74% \| Positive \| 7471 \| 39.32% \|
	\| Joy \| 441 \| 2.32% \|
	\| Trust \| 428 \| 2.25% \|
	\| Sum \| 19002 \| \| \| \| \|

	## Eval results

	\| Class \| Precision \| Recall \| F-Score \|
	\|-----\|------------\|------------\|------\|
	\|Neutral\|0.83\|0.71\|0.76\|
	\|Positive\|0.87\|0.91\|0.9\|
	\|Negative\|0.94\|0.91\|0.93\|
	\|Macro AVG\|0.88\|0.85\|0.86\|
	\|Weighted WVG\|0.91\|0.91\|0.91\|


	## Usage

	```py
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT3")
	model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT3")
	```

	### BibTeX entry and citation info

	If you use the model, please cite the following paper:

	Bibtex:
	```bibtex
	@ARTICLE{10149341,
	author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
	journal={IEEE Access},
	title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication},
	year={2023},
	volume={11},
	number={},
	pages={60267-60278},
	doi={10.1109/ACCESS.2023.3285536}
	}
	```