Vortex-Embed v3 β€” Sentence-Similarity for RAG

Retrieval-optimized 4-bit static embeddings for sentence-similarity and RAG.

Built on VTXAI/Vortex-Embed-4.7M (29528 vocab Γ— 256 dim, 4-bit LF4 packed = 4.7 MB on disk) with a set of training-free retrieval upgrades that lift STS-B Spearman from 0.7462 (baseline LF4) to 0.7560 (v3 with SIF+PC=1).

What changed vs the v1 baseline

All four upgrades are inference-time only β€” the underlying 4-bit weights are bit-identical to the v1 artifact. They are:

  1. SIF IDF weighting with sif_a=0.01 (sweep-optimized for STS-B).
  2. Top-1 PC removal (sweep-optimized β€” 1 PC is enough for STS-B).
  3. Pure-numpy bucket-boundary segment-sum for fast mean-pool.
  4. CPU-torch scatter (index_add_) for the hot path.

Benchmark

Model Spearman ρ STS-B Encode ms/text Dequant cold RAM On-disk
LF4 baseline (v1) 0.7462 0.87 231 ms 30 MB 4.7 MB
Vortex-Embed v3 (this) 0.7560 0.08 51 ms 30 MB 4.7 MB

+1.0 pp Spearman, 11Γ— faster encode.

Usage

from huggingface_hub import snapshot_download
from lf4_v3_sentence import VortexEmbedV3

path = snapshot_download("VTXAI/Vortex-Embed-v3-sentence")
model = VortexEmbedV3.from_pretrained(path)
print(f"vocab={model.vocab_size}, dim={model.dim}, size={model.model_size_mb:.1f} MB")

# Single-text encode
vec = model.encode("find python json parser", normalize=True)  # (256,)

# Batch encode
docs = ["def parse_json(s): return json.loads(s)",
        "class WeatherAPI: pass",
        "import requests"]
doc_embs = model.encode(docs, normalize=True)  # (3, 256)

# RAG retrieval
import numpy as np
# ... chunk corpus, build doc_embs as (n, 256) ...
query = "where do we parse JSON requests"
q_emb = model.encode(query, normalize=True)
scores, indices = model.search(q_emb, doc_embs, top_k=10)
for rank, (s, i) in enumerate(zip(scores[0], indices[0]), 1):
    print(f"#{rank} ({s:.3f}) doc #{i}")

Files

  • model.safetensors β€” 4-bit LF4 packed weights (3.7 MB)
  • tokenizer.json β€” HuggingFace fast tokenizer
  • config.json β€” model + retrieval config
  • lf4_v3_sentence.py β€” self-contained model class
  • README.md β€” this file

License

Apache 2.0

Downloads last month
-
Safetensors
Model size
4.25M params
Tensor type
F16
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support