Feature Extraction
Transformers
Safetensors
sentence-transformers
minicpm
image-feature-extraction
mteb
custom_code
Eval Results (legacy)
Instructions to use openbmb/MiniCPM-Embedding-Light with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM-Embedding-Light with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="openbmb/MiniCPM-Embedding-Light", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/MiniCPM-Embedding-Light", trust_remote_code=True, dtype="auto") - sentence-transformers
How to use openbmb/MiniCPM-Embedding-Light with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("openbmb/MiniCPM-Embedding-Light", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| import asyncio | |
| from infinity_emb import AsyncEngineArray, EngineArgs, AsyncEmbeddingEngine | |
| import numpy as np | |
| array = AsyncEngineArray.from_args([ | |
| EngineArgs(model_name_or_path = "openbmb/MiniCPM-Embedding-Light", engine="torch", dtype="float16", bettertransformer=False, pooling_method="mean", trust_remote_code=True), | |
| ]) | |
| queries = ["中国的首都是哪里?"] # "What is the capital of China?" | |
| passages = ["beijing", "shanghai"] # "北京", "上海" | |
| INSTRUCTION = "Query:" | |
| queries = [f"{INSTRUCTION} {query}" for query in queries] | |
| async def embed_text(engine: AsyncEmbeddingEngine,sentences): | |
| async with engine: | |
| embeddings, usage = await engine.embed(sentences=sentences) | |
| return embeddings | |
| queries_embedding = asyncio.run(embed_text(array[0],queries)) | |
| passages_embedding = asyncio.run(embed_text(array[0],passages)) | |
| scores = (np.array(queries_embedding) @ np.array(passages_embedding).T) | |
| print(scores.tolist()) # [[0.40356746315956116, 0.36183443665504456]] |