MPNet INT8 β ONNX Quantized
ONNX INT8 quantized version of sentence-transformers/all-mpnet-base-v2 for efficient general-purpose sentence embeddings.
Model Details
| Property | Value |
|---|---|
| Base Model | sentence-transformers/all-mpnet-base-v2 |
| Format | ONNX |
| Quantization | INT8 (dynamic quantization) |
| Embedding Dimension | 768 |
| Quantized by | JustEmbed |
What is this?
This is a quantized ONNX export of all-mpnet-base-v2, one of the best general-purpose sentence embedding models from the sentence-transformers library. It maps sentences and paragraphs to a 768-dimensional dense vector space. The INT8 quantization reduces model size and improves inference speed while maintaining high accuracy.
Use Cases
- Semantic text search
- Sentence similarity
- Clustering and topic modeling
- Paraphrase detection
- General-purpose text embeddings
Files
model_quantized.onnxβ INT8 quantized ONNX modeltokenizer.jsonβ Fast tokenizervocab.txtβ Vocabulary fileconfig.jsonβ Model configuration
Usage with JustEmbed
from justembed import Embedder
embedder = Embedder("mpnet-int8")
vectors = embedder.embed(["This is a sentence", "This is another sentence"])
Usage with ONNX Runtime
import onnxruntime as ort
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(".")
session = ort.InferenceSession("model_quantized.onnx")
inputs = tokenizer("This is a sentence", return_tensors="np")
outputs = session.run(None, dict(inputs))
Quantization Details
- Method: Dynamic INT8 quantization via ONNX Runtime
- Source: Original PyTorch weights converted to ONNX, then quantized
- Speed: ~2-3x faster inference than FP32
- Size: ~4x smaller than FP32
License
This model is a derivative work of sentence-transformers/all-mpnet-base-v2.
The original model is licensed under Apache License 2.0. This quantized version is distributed under the same license. See the LICENSE file for the full text.
Citation
@inproceedings{song2020mpnet,
title={MPNet: Masked and Permuted Pre-training for Language Understanding},
author={Song, Kaitao and Tan, Xu and Qin, Tao and Lu, Jianfeng and Liu, Tie-Yan},
booktitle={NeurIPS},
year={2020}
}
Acknowledgments
- Original model by UKP Lab / sentence-transformers
- Quantization and packaging by JustEmbed
- Downloads last month
- 14
Model tree for sekarkrishna/mpnet-int8
Base model
sentence-transformers/all-mpnet-base-v2