ModernBERT-embed-large-unsupervised

modernbert-embed-unsupervised-large is the unsupervised checkpoint trained with the contrastors library for 1 epoch over the 235M weakly-supervised contrastive pairs curated in Nomic Embed.

We suggest using moderbert-embed-large for embedding tasks.

Performance

Model	Average (56)	Classification (12)	Clustering (11)	Pair Classification (3)	Reranking (4)	Retrieval (15)	STS (10)	Overall
nomic-embed-text-v1_unsup	59.9	71.2	42.5	83.7	55.0	48.0	80.8	30.7
modernbert-embed-base-unsupervised	60.03	72.11	44.34	82.78	55.0	47.05	80.33	31.2
modernbert-embed-large-unsupervised	60.71	72.90	44.96	83.44	55.54	47.90	80.95	29.86

Acknowledgment

We wanted to thank Zach Nussbaum from Nomic AI for building and sharing the Nomic Embed recipe and tools and its support during the training of this model!

The training has been run on Orange Business Cloud Avenue infrastructure.

Citation

If you find the model, dataset, or training code useful, please considering citing ModernBERT as well as Nomic Embed:

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

@misc{nussbaum2024nomic,
      title={Nomic Embed: Training a Reproducible Long Context Text Embedder}, 
      author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
      year={2024},
      eprint={2402.01613},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

And if you want to cite this fine-tuning in particular, please use:

@misc{ModernBERT-embed-large,
  title={ModernBERT-embed-large},
  author={Chaffin, Antoine},
  url={https://huggingface.co/lightonai/modernbert-embed-large},
  year={2025}
}

Downloads last month: 108

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for lightonai/modernbert-embed-large-unsupervised

Base model

answerdotai/ModernBERT-large

Finetuned

(322)

this model

Papers for lightonai/modernbert-embed-large-unsupervised

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published Dec 18, 2024 • 167

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 18

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

76.642
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

39.438
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

70.473
accuracy on MTEB AmazonPolarityClassification
test set self-reported

91.830
ap on MTEB AmazonPolarityClassification
test set self-reported

88.836
f1 on MTEB AmazonPolarityClassification
test set self-reported

91.825
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

47.864
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

47.281