Rank-Embed-1B
Rank-Embed-1B is a specialized 1B-parameter bi-encoder model from GorankLabs, fine-tuned from google/gemma-3-1b-pt. It is designed to convert text into dense vector representations so systems can reason about semantic meaning rather than relying solely on keyword overlap.
Built for retrieval-first workloads, Rank-Embed-1B is intended for complex search, semantic retrieval, ranking, retrieval-augmented generation, clustering, and duplicate detection. It combines efficient inference with strong language understanding, making it well suited for production retrieval pipelines on practical hardware.
The model uses mean pooling over the final hidden states to produce embeddings and introduces three task-specific special tokens: <|query_token|>, <|document_token|>, and <|passage_token|>. These tokens help distinguish input types and improve alignment between queries and candidate texts.
For best results, prepend the appropriate token to every input before encoding.
Model Summary
| Property | Value |
|---|---|
| Architecture | Custom Gemma-based embedding model |
| Base model | google/gemma-3-1b-pt |
| Parameters | ~1.24B |
| Embedding dimension | 2048 |
| Maximum sequence length | 131,072 tokens |
| Pooling | Mean pooling over final hidden states |
| Precision | bfloat16 |
| Framework | PyTorch / Transformers |
| License | Apache 2.0 |
Key Capabilities
- Dense embedding generation for queries, documents, and passages
- Retrieval and semantic similarity support with task-specific token prefixes
- Strong performance for complex, multi-hop, and semantically rich search queries
- Long-context support up to 131,072 tokens
- Compatibility with Hugging Face Transformers through
trust_remote_code=True
Quick Start
Installation
pip install transformers torch sentencepiece
Embedding Queries and Documents
import torch
from transformers import AutoModel, AutoTokenizer
model_id = "GorankLabs/Rank-Embed-1B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
model.eval()
def mean_pool(token_embeddings, attention_mask):
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
input_mask_expanded.sum(1), min=1e-9
)
def embed(texts: list[str], prefix_token: str) -> torch.Tensor:
prefixed = [f"{prefix_token} {text}" for text in texts]
encoded = tokenizer(
prefixed,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt",
)
with torch.no_grad():
outputs = model(**encoded)
embeddings = mean_pool(outputs.last_hidden_state, encoded["attention_mask"])
return torch.nn.functional.normalize(embeddings, p=2, dim=-1)
queries = ["What is dense retrieval?"]
documents = [
"Dense retrieval uses learned embeddings to match queries and documents.",
"Sparse retrieval relies on exact keyword overlap such as BM25.",
]
query_embeddings = embed(queries, tokenizer.query_token)
document_embeddings = embed(documents, tokenizer.document_token)
scores = (query_embeddings @ document_embeddings.T) * 100
print(scores.tolist())
Pipeline Usage
from transformers import pipeline
pipe = pipeline(
"feature-extraction",
model="GorankLabs/Rank-Embed-1B",
trust_remote_code=True,
torch_dtype="bfloat16",
)
embeddings = pipe("<|query_token|> What causes aurora borealis?", return_tensors=True)
print(embeddings[0].shape)
Special Tokens
The model extends the base Gemma vocabulary with three additional special tokens:
| Token | Token ID | Purpose |
|---|---|---|
| `< | query_token | >` |
| `< | document_token | >` |
| `< | passage_token | >` |
These tokens should always be prepended to the corresponding input type.
Architecture Details
Rank-Embed-1B is derived from the Gemma architecture and adapted for embedding-focused workloads. The repository includes custom configuration, tokenizer, and model classes exposed through trust_remote_code=True.
Key implementation details:
- custom model registration through
trust_remote_code=True pooling_type:mean- custom
AutoConfig,AutoModel, andAutoTokenizerregistration - long-context support up to 131,072 tokens
What This Model Is
Rank-Embed-1B is designed to transform text into mathematical vectors, or embeddings, that capture semantic meaning. Instead of depending purely on lexical overlap, it enables systems to compare inputs based on intent, topic, and contextual similarity.
As a compact 1B-parameter model built on Gemma 3 1B PT, it is optimized for efficient deployment while retaining the capacity needed for nuanced retrieval tasks. This makes it a strong fit for teams that need practical inference performance without sacrificing retrieval quality.
Unlike a generative chatbot, Rank-Embed-1B is purpose-built for information retrieval. Its role is not to generate responses, but to identify, compare, and surface the most relevant pieces of information from a corpus.
What It Can Do
- Semantic search: retrieves relevant content even when queries and documents use different wording.
- Complex search: handles nuanced, intent-heavy queries where the right result depends on context, relationships, and meaning rather than exact phrasing.
- Retrieval-augmented generation: serves as the retrieval layer for RAG systems by selecting relevant context for downstream language models.
- Clustering and organization: groups large collections of documents, tickets, or records by semantic similarity.
- Duplicate detection: identifies differently phrased inputs that express the same or highly similar meaning.
Loading the Model Safely
This repository provides custom Python modules, so the model should be loaded with trust_remote_code=True:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"GorankLabs/Rank-Embed-1B",
trust_remote_code=True,
)
model = AutoModel.from_pretrained(
"GorankLabs/Rank-Embed-1B",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
Only enable trust_remote_code=True for repositories you trust, and review the custom code before deploying in production environments.
License
This model is released under the Apache License 2.0.
The base model weights are derived from google/gemma-3-1b-pt. Use of this repository must comply with the applicable Gemma license terms in addition to the license for this repository where required.
Contact and Citation
Maintained by GorankLabs. For questions, issues, or collaboration inquiries, please use the repository
- Downloads last month
- 29
Model tree for GorankLabs/Rank-Embed-1B
Base model
google/gemma-3-1b-pt