Instructions to use Quazim0t0/Byrne-Embed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Quazim0t0/Byrne-Embed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="Quazim0t0/Byrne-Embed", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Quazim0t0/Byrne-Embed", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Byrne-Embed
Byrne-Embed is a compact 85M-parameter sentence-embedding model. It maps text to 768-dimensional unit-norm vectors suitable for semantic similarity, retrieval, clustering, and reranking.
The backbone is a custom SpikeWhale decoder (the "Byrne" line). A mean-pooled representation of its last hidden state is projected to 768 dimensions by a learned head and unit-normalized, so cosine similarity between two embeddings is just a dot product.
Benchmark vs. EmbeddingGemma-300M
We benchmarked Byrne-Embed against Google's EmbeddingGemma-300M on 4,000 held-out sentences spanning educational web text, encyclopedic text, and instruction/chat text. Byrne-Embed's embedding geometry tracks closely with EmbeddingGemma's at roughly 1/3.5 the parameter count:
| Metric (Byrne-Embed vs EmbeddingGemma) | Result |
|---|---|
| Mean per-sentence cosine | 0.9415 (median 0.945, p10 0.912) |
| Sentences within 0.90 cosine | 94.7% |
| Similarity-structure agreement (Pearson) | 0.9702 |
| Similarity-structure agreement (Spearman) | 0.9599 |
| Per-anchor neighbour-ranking correlation | 0.9494 |
| Retrieval top-1 nearest-neighbour agreement | 72.8% |
| Retrieval Recall@10 overlap | 78.2% |
Reading the numbers. The two most important measures — how closely the two models agree on which sentences are similar — land at Pearson 0.97 / Spearman 0.96: when EmbeddingGemma judges two sentences similar, Byrne-Embed agrees almost identically. 94.7% of all sentences sit within 0.90 cosine. The lower top-1 retrieval number is expected and not a quality gap: in a dense pool of real sentences many neighbours are near-ties (0.88 vs 0.87), so the single #1 slot flips easily between near-duplicates — which is why Recall@10 stays at ~78% and the neighbour-ranking correlation is 0.95. Both models find the same neighbourhood; they just occasionally swap rank 1 and rank 2 among near-identical candidates.
Reproduce these numbers with the bundled run_tests.py (it loads both
models and prints the full table).
MTEB English Benchmark — MTEB(eng, v2)
Evaluated with the official mteb library on the full MTEB(eng, v2) suite (41/41 tasks). Raw results are in mteb_results/; machine-readable scores are in the model-index metadata above.
Overall MTEB(eng, v2) mean: 50.79
| Category | Mean | Tasks |
|---|---|---|
| STS | 71.93 | 9 |
| Classification | 70.57 | 8 |
| PairClassification | 74.07 | 3 |
| Clustering | 37.32 | 8 |
| Reranking | 40.48 | 2 |
| Retrieval | 24.64 | 10 |
| Summarization | 22.39 | 1 |
STS
| Task | Score |
|---|---|
| BIOSSES | 75.56 |
| SICK-R | 69.08 |
| STS12 | 64.88 |
| STS13 | 72.08 |
| STS14 | 67.76 |
| STS15 | 77.13 |
| STS17 | 83.23 |
| STS22.v2 | 60.53 |
| STSBenchmark | 77.08 |
Classification
| Task | Score |
|---|---|
| AmazonCounterfactualClassification | 80.12 |
| Banking77Classification | 74.64 |
| ImdbClassification | 60.97 |
| MTOPDomainClassification | 92.29 |
| MassiveIntentClassification | 63.23 |
| MassiveScenarioClassification | 73.05 |
| ToxicConversationsClassification | 62.94 |
| TweetSentimentExtractionClassification | 57.29 |
PairClassification
| Task | Score |
|---|---|
| SprintDuplicateQuestions | 86.47 |
| TwitterSemEval2015 | 53.19 |
| TwitterURLCorpus | 82.55 |
Clustering
| Task | Score |
|---|---|
| ArXivHierarchicalClusteringP2P | 53.15 |
| ArXivHierarchicalClusteringS2S | 50.39 |
| BiorxivClusteringP2P.v2 | 33.73 |
| MedrxivClusteringP2P.v2 | 32.70 |
| MedrxivClusteringS2S.v2 | 29.04 |
| StackExchangeClustering.v2 | 41.93 |
| StackExchangeClusteringP2P.v2 | 35.22 |
| TwentyNewsgroupsClustering.v2 | 22.39 |
Reranking
| Task | Score |
|---|---|
| AskUbuntuDupQuestions | 52.88 |
| MindSmallReranking | 28.07 |
Retrieval
| Task | Score |
|---|---|
| ArguAna | 37.67 |
| CQADupstackGamingRetrieval | 37.14 |
| CQADupstackUnixRetrieval | 23.48 |
| ClimateFEVERHardNegatives | 13.60 |
| FEVERHardNegatives | 28.70 |
| FiQA2018 | 11.38 |
| HotpotQAHardNegatives | 30.47 |
| SCIDOCS | 10.15 |
| TRECCOVID | 29.30 |
| Touche2020Retrieval.v3 | 24.50 |
Summarization
| Task | Score |
|---|---|
| SummEvalSummarization.v2 | 22.39 |
Usage
The model loads with standard transformers via trust_remote_code (the projection head
is fused into the weights, so a single from_pretrained loads everything):
import torch
from transformers import AutoModel, AutoTokenizer
tok = AutoTokenizer.from_pretrained("Quazim0t0/Byrne-Embed", trust_remote_code=True)
model = AutoModel.from_pretrained("Quazim0t0/Byrne-Embed", trust_remote_code=True).eval()
texts = ["The cat sat on the windowsill.", "A feline rested by the window."]
enc = tok(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
with torch.no_grad():
emb = model(**enc).last_hidden_state # (2, 768), L2-normalized
print(float(emb[0] @ emb[1])) # cosine similarity ~ 0.83
forward() returns L2-normalized 768-dim sentence embeddings, so cosine similarity is
just a dot product.
Files
| File | Purpose |
|---|---|
model.safetensors, config.json |
fused SpikeWhale backbone + projection head + config |
modeling_byrne_embed.py |
self-contained custom AutoModel class (SpikeWhale arch inlined; loaded via trust_remote_code) |
tokenizer.json, tokenizer_config.json, spike_tokenizer.py |
byte-level SpikeTokenizer + its code |
Limitations
- English-centric evaluation; non-English performance is untested.
- The single residual weak spot observed during evaluation is finance/economics paraphrase retrieval; general semantic similarity is strong.
- Custom architecture: load via the bundled
byrne_embedder.py(local modeling code — no remote code execution).
Citation
If you use Byrne-Embed, please cite:
@misc{byrne2026byrneembed,
title = {Byrne-Embed: A Compact 85M Sentence-Embedding Model},
author = {Byrne, Dean},
year = {2026},
howpublished = {\url{https://huggingface.co/Quazim0t0/Byrne-Embed}},
}
License
Apache-2.0.
- Downloads last month
- 138
Space using Quazim0t0/Byrne-Embed 1
Collection including Quazim0t0/Byrne-Embed
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassificationtest set self-reported80.120
- v_measure on MTEB ArXivHierarchicalClusteringP2Ptest set self-reported53.150
- v_measure on MTEB ArXivHierarchicalClusteringS2Stest set self-reported50.390
- ndcg_at_10 on MTEB ArguAnatest set self-reported37.670
- map_at_1000 on MTEB AskUbuntuDupQuestionstest set self-reported52.880
- cosine_spearman on MTEB BIOSSEStest set self-reported75.560
- accuracy on MTEB Banking77Classificationtest set self-reported74.640
- v_measure on MTEB BiorxivClusteringP2P.v2test set self-reported33.730