Instructions to use hyper3labs/hyper3-clip-v0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use hyper3labs/hyper3-clip-v0.5 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("hyper3labs/hyper3-clip-v0.5", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use hyper3labs/hyper3-clip-v0.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="hyper3labs/hyper3-clip-v0.5", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hyper3labs/hyper3-clip-v0.5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Hyper3-CLIP v0.5
Hyper3-CLIP v0.5 is an open-weight hyperbolic vision-language checkpoint from hyper³labs. It places image and text representations in a Lorentz space and was trained with compositional entailment constraints for hierarchy-sensitive image-text retrieval.
This v0.5 release is intended as an open baseline and research artifact.
Model
- Architecture: ViT-B scale vision-language model
- Vision backbone:
vit_base_patch16_224 - Text backbone:
openai/clip-vit-base-patch32 - Embedding dimension: 512
- Training steps: 500,000
- Global batch size: 768
- Weights artifact:
model.safetensors
The original full training checkpoint included optimizer, scheduler, AMP scaler,
RNG state, config, and step metadata. This repository publishes the weights-only
model.safetensors artifact for inference and downstream research.
Quick Start: Sentence Transformers
The default way to use this checkpoint is through Sentence Transformers. The adapter in this repository returns 512-dimensional L2-normalized tangent-space embeddings for standard cosine/dot-product vector stores.
Install the runtime dependencies:
pip install "sentence-transformers>=5.5.1" timm safetensors pyyaml Pillow
If you are using the gated Hugging Face repository from a fresh machine, accept
access on the model page and set HF_TOKEN.
from PIL import Image
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("hyper3labs/hyper3-clip-v0.5", trust_remote_code=True)
image_embedding = model.encode([Image.open("/path/to/image.jpg")], normalize_embeddings=True)
text_embedding = model.encode(["machined metal part"], normalize_embeddings=True)
Transformers
from PIL import Image
import torch
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("hyper3labs/hyper3-clip-v0.5", trust_remote_code=True).eval()
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32")
image = model.preprocess_image(Image.open("/path/to/image.jpg")).unsqueeze(0)
text = tokenizer(
["machined metal part"],
padding=True,
truncation=True,
max_length=model.config.max_text_length,
return_tensors="pt",
)
with torch.no_grad():
outputs = model(
pixel_values=image,
input_ids=text["input_ids"],
attention_mask=text["attention_mask"],
)
image_embedding = outputs.image_embeds
text_embedding = outputs.text_embeds
Haystack image retrieval pipeline
For indexing images in a Haystack retrieval pipeline, use
SentenceTransformersDocumentImageEmbedder with image paths in
Document.meta["file_path"], paired with SentenceTransformersTextEmbedder for
text queries.
pip install "haystack-ai>=2.30.1" "sentence-transformers>=5.5.1" timm safetensors pyyaml Pillow
from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.embedders.image import SentenceTransformersDocumentImageEmbedder
model_id = "hyper3labs/hyper3-clip-v0.5"
documents = [
Document(
content="front view of a machined metal part",
meta={"file_path": "/path/to/image.jpg"},
)
]
image_embedder = SentenceTransformersDocumentImageEmbedder(
model=model_id,
trust_remote_code=True,
batch_size=8,
normalize_embeddings=True,
)
documents = image_embedder.run(documents=documents)["documents"]
text_embedder = SentenceTransformersTextEmbedder(
model=model_id,
trust_remote_code=True,
normalize_embeddings=True,
)
query_embedding = text_embedder.run("machined metal part")["embedding"]
Evaluation
The numbers below use the official evaluator convention for R@10. Higher is better except for TIE and LCA.
| Model | Comparable setting | ImageNet top-1 | COCO text R@10 | COCO image R@10 | Flickr text R@10 | Flickr image R@10 | TIE | LCA | Jaccard | H-Prec | H-Rec |
|---|---|---|---|---|---|---|---|---|---|---|---|
| MERU-B/16 | same-family baseline | 40.1 | 82.0 | 68.6 | 96.2 | 90.0 | 3.630 | 2.220 | 0.780 | 0.850 | 0.850 |
| HyCoCLIP-B/16 | official checkpoint | 45.8 | 82.0 | 69.3 | 95.4 | 90.3 | 3.172 | 2.047 | 0.814 | 0.874 | 0.874 |
| UNCHA-B/16 | official checkpoint | 48.8 | 82.6 | 71.0 | 95.9 | 91.2 | 2.945 | 1.961 | 0.828 | 0.883 | 0.884 |
| PHyCLIP-B/16 | related reported result | 44.4 | 80.4 | 68.7 | 95.6 | 89.9 | 3.285 | 2.088 | 0.807 | 0.868 | 0.868 |
| Hyper3-CLIP v0.5 | this release | 48.5 | 84.0 | 72.8 | 97.5 | 92.4 | 2.972 | 1.986 | 0.828 | 0.882 | 0.883 |
Raw evaluation files are included:
eval_coco_karpathy_final.jsoneval_flickr30k_final.jsoneval_imagenet_final.jsoneval_hycoclip_uncha_intersection_final.json
License And Attribution
The model materials in this repository are released under OpenMDW-1.0. See
LICENSE.
Redistributions should preserve NOTICE, LICENSE, and the original model card
when practical. Modified or derived checkpoints should use a distinct name and
must not imply endorsement by hyper³labs.
Please cite and link to the original hyper³labs model repository when publishing benchmarks, papers, derivative checkpoints, or public demos based on this model.
Intended Use
This release is intended for:
- hierarchy-sensitive image-text retrieval research
- zero-shot and retrieval evaluation
- multimodal embedding baselines
- downstream experiments with hyperbolic representation learning
This model has not been validated for safety-critical use.
Citation
If you use Hyper3-CLIP v0.5, cite the original model repository and hyper³labs.
- Downloads last month
- 110
Model tree for hyper3labs/hyper3-clip-v0.5
Base model
openai/clip-vit-base-patch32