Instructions to use deepvk/bert-base-uncased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepvk/bert-base-uncased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="deepvk/bert-base-uncased")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("deepvk/bert-base-uncased") model = AutoModel.from_pretrained("deepvk/bert-base-uncased") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - ru | |
| - en | |
| library_name: transformers | |
| pipeline_tag: feature-extraction | |
| # BERT-base | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| Pretrained bidirectional encoder for russian language. | |
| The model was trained using standard MLM objective on large text corpora including open social data. | |
| See `Training Details` section for more information. | |
| ⚠️ This model contains only the encoder part without any pretrained head. | |
| - **Developed by:** [deepvk](https://vk.com/deepvk) | |
| - **Model type:** BERT | |
| - **Languages:** Mostly russian and small fraction of other languages | |
| - **License:** Apache 2.0 | |
| ## How to Get Started with the Model | |
| ```python | |
| from transformers import AutoTokenizer, AutoModel | |
| tokenizer = AutoTokenizer.from_pretrained("deepvk/bert-base-uncased") | |
| model = AutoModel.from_pretrained("deepvk/bert-base-uncased") | |
| text = "Привет, мир!" | |
| inputs = tokenizer(text, return_tensors='pt') | |
| predictions = model(**inputs) | |
| ``` | |
| ## Training Details | |
| The model was trained using the NVIDIA source code. See the [pretraining documentation](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/README.md#training-process) for details. | |
| ### Training Data | |
| 250 GB of filtered texts in total. | |
| A mix of the following data: Wikipedia, Books and Social corpus. | |
| ### Architecture details | |
| | Argument | Value | | |
| |-------------------------|----------------| | |
| |Encoder layers | 12 | | |
| |Encoder attention heads | 12 | | |
| |Encoder embed dim | 768 | | |
| |Encoder ffn embed dim | 3,072 | | |
| |Activation function | GeLU | | |
| |Attention dropout | 0.1 | | |
| |Dropout | 0.1 | | |
| |Max positions | 512 | | |
| |Vocab size | 36000 | | |
| |Tokenizer type | BertTokenizer | | |
| ## Evaluation | |
| We evaluated the model on [Russian Super Glue](https://russiansuperglue.com/) dev set. | |
| The best result in each task is marked in bold. | |
| All models have the same size except the distilled version of DeBERTa. | |
| | Model | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Score | | |
| |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------| | |
| | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 | | |
| | [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 | | |
| | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** | | |
| | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 | | |
| | [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 | |