Instructions to use allmalab/bert-small-aze with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allmalab/bert-small-aze with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="allmalab/bert-small-aze")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("allmalab/bert-small-aze") model = AutoModelForMaskedLM.from_pretrained("allmalab/bert-small-aze") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| tags: | |
| - generated_from_trainer | |
| base_model: prajjwal1/bert-small | |
| model-index: | |
| - name: bert-small-aze | |
| results: [] | |
| <!-- This model card has been generated automatically according to the information the Trainer had access to. You | |
| should probably proofread and complete it, then remove this comment. --> | |
| # bert-small-aze | |
| This model is a trained (not fine-tuned) version of [prajjwal1/bert-small](https://huggingface.co/prajjwal1/bert-small) architecture on the [allmalab/DOLLMA](https://huggingface.co/datasets/allmalab/DOLLMA) dataset. | |
| ### Citation | |
| If you use the dataset, please cite the following paper: | |
| ```bib | |
| @inproceedings{isbarov-etal-2024-open, | |
| title = "Open foundation models for {A}zerbaijani language", | |
| author = "Isbarov, Jafar and | |
| Huseynova, Kavsar and | |
| Mammadov, Elvin and | |
| Hajili, Mammad and | |
| Ataman, Duygu", | |
| editor = {Ataman, Duygu and | |
| Derin, Mehmet Oguz and | |
| Ivanova, Sardana and | |
| K{\"o}ksal, Abdullatif and | |
| S{\"a}lev{\"a}, Jonne and | |
| Zeyrek, Deniz}, | |
| booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)", | |
| month = aug, | |
| year = "2024", | |
| address = "Bangkok, Thailand and Online", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2024.sigturk-1.2", | |
| pages = "18--28", | |
| abstract = "The emergence of multilingual large language models has enabled the development of language understanding and generation systems in Azerbaijani. However, most of the production-grade systems rely on cloud solutions, such as GPT-4. While there have been several attempts to develop open foundation models for Azerbaijani, these works have not found their way into common use due to a lack of systemic benchmarking. This paper encompasses several lines of work that promote open-source foundation models for Azerbaijani. We introduce (1) a large text corpus for Azerbaijani, (2) a family of encoder-only language models trained on this dataset, (3) labeled datasets for evaluating these models, and (4) extensive evaluation that covers all major open-source models with Azerbaijani support.", | |
| } | |
| ``` | |
| https://arxiv.org/abs/2407.02337 | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 5e-05 | |
| - train_batch_size: 32 | |
| - seed: 42 | |
| - gradient_accumulation_steps: 4 | |
| - total_train_batch_size: 128 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: cosine | |
| - lr_scheduler_warmup_steps: 10000 | |
| - num_epochs: 10 | |
| - mixed_precision_training: Native AMP | |
| ### Framework versions | |
| - Transformers 4.37.1 | |
| - Pytorch 2.1.2+cu121 | |
| - Datasets 2.16.1 | |
| - Tokenizers 0.15.1 | |