MPNet INT8 β€” ONNX Quantized

ONNX INT8 quantized version of sentence-transformers/all-mpnet-base-v2 for efficient general-purpose sentence embeddings.

Model Details

Property Value
Base Model sentence-transformers/all-mpnet-base-v2
Format ONNX
Quantization INT8 (dynamic quantization)
Embedding Dimension 768
Quantized by JustEmbed

What is this?

This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy.

Use Cases

  • Semantic text search
  • Sentence similarity
  • Clustering and topic modeling
  • Paraphrase detection
  • General-purpose text embeddings

Files

  • model_quantized.onnx β€” INT8 quantized ONNX model
  • tokenizer.json β€” Fast tokenizer
  • vocab.txt β€” Vocabulary file
  • config.json β€” Model configuration

Usage with JustEmbed

from justembed import Embedder

embedder = Embedder("mpnet-int8")
vectors = embedder.embed(["This is a sentence", "This is another sentence"])

Usage with ONNX Runtime

import onnxruntime as ort
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")

inputs = tokenizer("This is a sentence", return_tensors="np")
outputs = session.run(None, dict(inputs))

Quantization Details

  • Method: Dynamic INT8 quantization via ONNX Runtime
  • Source: Original PyTorch weights converted to ONNX, then quantized
  • Speed: ~2-3x faster inference than FP32
  • Size: ~4x smaller than FP32

License

This model is a derivative work of sentence-transformers/all-mpnet-base-v2.

The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.

Citation

@inproceedings{song2020mpnet,
  title={MPNet: Masked and Permuted Pre-training for Language Understanding},
  author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
  booktitle={NeurIPS},
  year={2020}
}

Acknowledgments

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sekarkrishna/mpnet-int8

Quantized
(5)
this model