Instructions to use KiteFishAI/Minnow-Em1-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use KiteFishAI/Minnow-Em1-0.6B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("KiteFishAI/Minnow-Em1-0.6B") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Minnow-Em1-0.6B
Minnow-Em1-0.6B is a compact (0.6B-parameter) multilingual text-embedding model from
KiteFish AI, adapted from Qwen/Qwen3-0.6B into a fully bidirectional encoder and
fine-tuned for general-purpose embeddings: retrieval, semantic textual similarity (STS),
classification, clustering, reranking, and bitext mining.
Version: v1 — the first public release in the Minnow-Em line.
⚠️ Important: this model must be loaded with bidirectional attention
This model was trained with the causal attention mask removed (every token attends to every other token). That change is applied at load time and is not baked into the saved weights, so loading the model the ordinary way leaves it in causal mode and produces poor embeddings. Always apply the patch below after loading.
import types, torch
from sentence_transformers import SentenceTransformer
from transformers import PreTrainedModel
def load_minnow(name="KiteFishAI/Minnow-Em1-0.6B", device="cuda"):
model = SentenceTransformer(
name,
model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "sdpa"},
device=device,
)
# --- make the backbone bidirectional (must match training) ---
hf = None
first = model[0]
for attr in ("auto_model", "model"):
c = getattr(first, attr, None)
if isinstance(c, PreTrainedModel):
hf = c; break
if hf is None:
hf = next(m for m in first.modules() if isinstance(m, PreTrainedModel))
for _, m in hf.named_modules():
if hasattr(m, "is_causal"):
m.is_causal = False
base = getattr(hf, "model", hf)
if hasattr(base, "_update_causal_mask"):
def _no_mask(self, attn_mask, inp, *a, **kw):
if attn_mask is None:
return None
if attn_mask.dim() == 2:
dt = inp.dtype
return (1.0 - attn_mask[:, None, None, :].to(dt)) * torch.finfo(dt).min
return attn_mask
base._update_causal_mask = types.MethodType(_no_mask, base)
hf.config.is_decoder = False
# sanity check: token-0 state must change when a later token changes
tok = first.tokenizer
with torch.no_grad():
a = tok(["The quick brown fox"], return_tensors="pt").to(hf.device)
b = tok(["The quick brown cat"], return_tensors="pt").to(hf.device)
d = (hf(**a).last_hidden_state[0, 0] - hf(**b).last_hidden_state[0, 0]).abs().max()
assert d > 1e-4, "Model is still causal — patch did not take effect."
return model
Usage
The model is instruction-aware. Prepend a task instruction to each query using the format:
Instruct: {task instruction}\nQuery: {text}
- Retrieval / reranking (asymmetric): instruct the query only; leave documents raw.
- STS / classification / clustering / bitext (symmetric): instruct all texts.
model = load_minnow()
def with_instruction(instruction, texts):
return [f"Instruct: {instruction}\nQuery: {t}" for t in texts]
# --- retrieval example ---
queries = with_instruction(
"Given a query, retrieve documents that answer the query",
["What causes the northern lights?"],
)
docs = ["Auroras are produced when charged particles from the sun excite atoms in the upper atmosphere."]
q = model.encode(queries, normalize_embeddings=True)
d = model.encode(docs, normalize_embeddings=True) # documents: no instruction
print((q @ d.T))
Model details
| Base model | Qwen/Qwen3-0.6B |
| Parameters | ~0.6B |
| Attention | Bidirectional (causal mask removed) |
| Pooling | Mean pooling |
| Embedding dim | 1024 |
| Max sequence length | 512 |
| Instruction-aware | Yes (Instruct: … \nQuery: …) |
| Similarity | Cosine |
Training
Minnow-Em1 follows the now-standard multi-stage recipe for compact LLM-based embedders (cf. KaLM-Embedding-V2, Qwen3-Embedding, Llama-Embed-Nemotron):
- Stage 1 — weakly-supervised contrastive pre-training. Large-scale query/passage pairs, in-batch negatives only, to adapt the bidirectional backbone to representation learning.
- Stage 2 — supervised contrastive fine-tuning. Task-homogeneous batches with mined hard negatives, InfoNCE (temperature 0.02) with focal reweighting (γ = 0.5) to emphasize hard examples, false-negative masking, and symmetric/asymmetric instruction routing by task type.
Training data spans retrieval, STS, classification, clustering, reranking, pair classification, and bitext-mining sources across multiple languages.
Evaluation
Evaluation on the MMTEB / MTEB task suite is being finalized with the official mteb harness; a
full results table will be added to this card in a subsequent revision. The model is optimized for
the multilingual MMTEB task mix.
Numbers will only be published once produced by the official
mtebpackage on the complete benchmark task set (not a partial or custom run).
Limitations and intended use
- Bidirectional load required (see above) — without the patch the model is effectively causal and underperforms badly.
- In-domain training data. The training mix includes the train splits of several public benchmark datasets (e.g. MS MARCO, HotpotQA, Natural Questions, NFCorpus, MIRACL). Scores on the corresponding evaluation tasks should be read as in-domain, not zero-shot.
- Language balance. v1's fine-tuning mix is weighted toward English question-answering retrieval; performance on some low-resource and cross-lingual tasks is correspondingly weaker. Rebalancing is planned for a future version.
- Intended for embedding/retrieval research and applications; not a generative model.
Acknowledgements
Built on Qwen/Qwen3-0.6B. Methodology informed by KaLM-Embedding-V2, Qwen3-Embedding, and
Llama-Embed-Nemotron-8B. Evaluated with the MTEB / MMTEB benchmark suite.
License
Released under Apache-2.0, consistent with the Qwen/Qwen3-0.6B base model. Verify license
compatibility for your use case before redistribution.
- Downloads last month
- -