Instructions to use lightonai/modernbert-embed-large-unsupervised with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use lightonai/modernbert-embed-large-unsupervised with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("lightonai/modernbert-embed-large-unsupervised") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
ModernBERT-embed-large-unsupervised
modernbert-embed-unsupervised-large is the unsupervised checkpoint trained with the contrastors library
for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.
We suggest using moderbert-embed-large for embedding tasks.
Performance
| Model | Average (56) | Classification (12) | Clustering (11) | Pair Classification (3) | Reranking (4) | Retrieval (15) | STS (10) | Overall |
|---|---|---|---|---|---|---|---|---|
| nomic-embed-text-v1_unsup | 59.9 | 71.2 | 42.5 | 83.7 | 55.0 | 48.0 | 80.8 | 30.7 |
| modernbert-embed-base-unsupervised | 60.03 | 72.11 | 44.34 | 82.78 | 55.0 | 47.05 | 80.33 | 31.2 |
| modernbert-embed-large-unsupervised | 60.71 | 72.90 | 44.96 | 83.44 | 55.54 | 47.90 | 80.95 | 29.86 |
Acknowledgment
We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!
The training has been run on Orange Business Cloud Avenue infrastructure.
Citation
If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:
@misc{modernbert,
title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference},
author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
year={2024},
eprint={2412.13663},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.13663},
}
@misc{nussbaum2024nomic,
title={Nomic Embed: Training a Reproducible Long Context Text Embedder},
author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
year={2024},
eprint={2402.01613},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
And if you want to cite this fine-tuning in particular, please use:
@misc{ModernBERT-embed-large,
title={ModernBERT-embed-large},
author={Chaffin, Antoine},
url={https://huggingface.co/lightonai/modernbert-embed-large},
year={2025}
}
- Downloads last month
- 108
Model tree for lightonai/modernbert-embed-large-unsupervised
Base model
answerdotai/ModernBERT-largePapers for lightonai/modernbert-embed-large-unsupervised
Nomic Embed: Training a Reproducible Long Context Text Embedder
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported76.642
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported39.438
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported70.473
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported91.830
- ap on MTEB AmazonPolarityClassificationtest set self-reported88.836
- f1 on MTEB AmazonPolarityClassificationtest set self-reported91.825
- accuracy on MTEB AmazonReviewsClassification (en)test set self-reported47.864
- f1 on MTEB AmazonReviewsClassification (en)test set self-reported47.281