Rank-Embed-1B

Rank-Embed-1B is a specialized 1B-parameter bi-encoder model from GorankLabs, fine-tuned from google/gemma-3-1b-pt. It is designed to convert text into dense vector representations so systems can reason about semantic meaning rather than relying solely on keyword overlap.

Built for retrieval-first workloads, Rank-Embed-1B is intended for complex search, semantic retrieval, ranking, retrieval-augmented generation, clustering, and duplicate detection. It combines efficient inference with strong language understanding, making it well suited for production retrieval pipelines on practical hardware.

The model uses mean pooling over the final hidden states to produce embeddings and introduces three task-specific special tokens: <|query_token|>, <|document_token|>, and <|passage_token|>. These tokens help distinguish input types and improve alignment between queries and candidate texts.

For best results, prepend the appropriate token to every input before encoding.

Model Summary

Property	Value
Architecture	Custom Gemma-based embedding model
Base model	`google/gemma-3-1b-pt`
Parameters	~1.24B
Embedding dimension	2048
Maximum sequence length	131,072 tokens
Pooling	Mean pooling over final hidden states
Precision	bfloat16
Framework	PyTorch / Transformers
License	Apache 2.0

Key Capabilities

Dense embedding generation for queries, documents, and passages
Retrieval and semantic similarity support with task-specific token prefixes
Strong performance for complex, multi-hop, and semantically rich search queries
Long-context support up to 131,072 tokens
Compatibility with Hugging Face Transformers through trust_remote_code=True

Quick Start

Installation

pip install transformers torch sentencepiece

Embedding Queries and Documents

import torch
from transformers import AutoModel, AutoTokenizer

model_id = "GorankLabs/Rank-Embed-1B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model.eval()

def mean_pool(token_embeddings, attention_mask):
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(
        input_mask_expanded.sum(1), min=1e-9
    )

def embed(texts: list[str], prefix_token: str) -> torch.Tensor:
    prefixed = [f"{prefix_token} {text}" for text in texts]
    encoded = tokenizer(
        prefixed,
        padding=True,
        truncation=True,
        max_length=512,
        return_tensors="pt",
    )
    with torch.no_grad():
        outputs = model(**encoded)
    embeddings = mean_pool(outputs.last_hidden_state, encoded["attention_mask"])
    return torch.nn.functional.normalize(embeddings, p=2, dim=-1)

queries = ["What is dense retrieval?"]
documents = [
    "Dense retrieval uses learned embeddings to match queries and documents.",
    "Sparse retrieval relies on exact keyword overlap such as BM25.",
]

query_embeddings = embed(queries, tokenizer.query_token)
document_embeddings = embed(documents, tokenizer.document_token)

scores = (query_embeddings @ document_embeddings.T) * 100
print(scores.tolist())

Pipeline Usage

from transformers import pipeline

pipe = pipeline(
    "feature-extraction",
    model="GorankLabs/Rank-Embed-1B",
    trust_remote_code=True,
    torch_dtype="bfloat16",
)

embeddings = pipe("<|query_token|> What causes aurora borealis?", return_tensors=True)
print(embeddings[0].shape)

Special Tokens

The model extends the base Gemma vocabulary with three additional special tokens:

Token	Token ID	Purpose
`<	query_token	>`
`<	document_token	>`
`<	passage_token	>`

These tokens should always be prepended to the corresponding input type.

Architecture Details

Rank-Embed-1B is derived from the Gemma architecture and adapted for embedding-focused workloads. The repository includes custom configuration, tokenizer, and model classes exposed through trust_remote_code=True.

Key implementation details:

custom model registration through trust_remote_code=True
pooling_type: mean
custom AutoConfig, AutoModel, and AutoTokenizer registration
long-context support up to 131,072 tokens

What This Model Is

Rank-Embed-1B is designed to transform text into mathematical vectors, or embeddings, that capture semantic meaning. Instead of depending purely on lexical overlap, it enables systems to compare inputs based on intent, topic, and contextual similarity.

As a compact 1B-parameter model built on Gemma 3 1B PT, it is optimized for efficient deployment while retaining the capacity needed for nuanced retrieval tasks. This makes it a strong fit for teams that need practical inference performance without sacrificing retrieval quality.

Unlike a generative chatbot, Rank-Embed-1B is purpose-built for information retrieval. Its role is not to generate responses, but to identify, compare, and surface the most relevant pieces of information from a corpus.

What It Can Do

Semantic search: retrieves relevant content even when queries and documents use different wording.
Complex search: handles nuanced, intent-heavy queries where the right result depends on context, relationships, and meaning rather than exact phrasing.
Retrieval-augmented generation: serves as the retrieval layer for RAG systems by selecting relevant context for downstream language models.
Clustering and organization: groups large collections of documents, tickets, or records by semantic similarity.
Duplicate detection: identifies differently phrased inputs that express the same or highly similar meaning.

Loading the Model Safely

This repository provides custom Python modules, so the model should be loaded with trust_remote_code=True:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "GorankLabs/Rank-Embed-1B",
    trust_remote_code=True,
)

model = AutoModel.from_pretrained(
    "GorankLabs/Rank-Embed-1B",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)

Only enable trust_remote_code=True for repositories you trust, and review the custom code before deploying in production environments.

License

This model is released under the Apache License 2.0.

The base model weights are derived from google/gemma-3-1b-pt. Use of this repository must comply with the applicable Gemma license terms in addition to the license for this repository where required.

Contact and Citation

Maintained by GorankLabs. For questions, issues, or collaboration inquiries, please use the repository

Downloads last month: 29

Model tree for GorankLabs/Rank-Embed-1B

Base model

google/gemma-3-1b-pt

Finetuned

(417)

this model

Datasets used to train GorankLabs/Rank-Embed-1B

Collection including GorankLabs/Rank-Embed-1B

Rank Embed

Collection

2 items • Updated about 19 hours ago