allmalab
/

bert-small-aze

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

bert-small-aze / README.md

jafarisbarov's picture

Update README.md

39a327d verified over 1 year ago

|

history blame contribute delete

2.74 kB

	---
	license: mit
	tags:
	- generated_from_trainer
	base_model: prajjwal1/bert-small
	model-index:
	- name: bert-small-aze
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bert-small-aze

	This model is a trained (not fine-tuned) version of [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small) architecture on the [allmalab/DOLLMA](https://huggingface.co/datasets/allmalab/DOLLMA) dataset.

	### Citation
	If you use the dataset, please cite the following paper:
	```bib
	@inproceedings{isbarov-etal-2024-open,
	title = "Open foundation models for {A}zerbaijani language",
	author = "Isbarov, Jafar and
	Huseynova, Kavsar and
	Mammadov, Elvin and
	Hajili, Mammad and
	Ataman, Duygu",
	editor = {Ataman, Duygu and
	Derin, Mehmet Oguz and
	Ivanova, Sardana and
	K{\"o}ksal, Abdullatif and
	S{\"a}lev{\"a}, Jonne and
	Zeyrek, Deniz},
	booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)",
	month = aug,
	year = "2024",
	address = "Bangkok, Thailand and Online",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2024.sigturk-1.2",
	pages = "18--28",
	abstract = "The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.",
	}
	```
	https://arxiv.org/abs/2407.02337

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 128
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 10000
	- num_epochs: 10
	- mixed_precision_training: Native AMP




	### Framework versions

	- Transformers 4.37.1
	- Pytorch 2.1.2+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.1