Minnow-Em1-0.6B

Minnow-Em1-0.6B is a compact (0.6B-parameter) multilingual text-embedding model from KiteFish AI, adapted from Qwen/Qwen3-0.6B into a fully bidirectional encoder and fine-tuned for general-purpose embeddings: retrieval, semantic textual similarity (STS), classification, clustering, reranking, and bitext mining.

Version: v1 — the first public release in the Minnow-Em line.

⚠️ Important: this model must be loaded with bidirectional attention

This model was trained with the causal attention mask removed (every token attends to every other token). That change is applied at load time and is not baked into the saved weights, so loading the model the ordinary way leaves it in causal mode and produces poor embeddings. Always apply the patch below after loading.

import types, torch
from sentence_transformers import SentenceTransformer
from transformers import PreTrainedModel

def load_minnow(name="KiteFishAI/Minnow-Em1-0.6B", device="cuda"):
    model = SentenceTransformer(
        name,
        model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "sdpa"},
        device=device,
    )
    # --- make the backbone bidirectional (must match training) ---
    hf = None
    first = model[0]
    for attr in ("auto_model", "model"):
        c = getattr(first, attr, None)
        if isinstance(c, PreTrainedModel):
            hf = c; break
    if hf is None:
        hf = next(m for m in first.modules() if isinstance(m, PreTrainedModel))
    for _, m in hf.named_modules():
        if hasattr(m, "is_causal"):
            m.is_causal = False
    base = getattr(hf, "model", hf)
    if hasattr(base, "_update_causal_mask"):
        def _no_mask(self, attn_mask, inp, *a, **kw):
            if attn_mask is None:
                return None
            if attn_mask.dim() == 2:
                dt = inp.dtype
                return (1.0 - attn_mask[:, None, None, :].to(dt)) * torch.finfo(dt).min
            return attn_mask
        base._update_causal_mask = types.MethodType(_no_mask, base)
    hf.config.is_decoder = False

    # sanity check: token-0 state must change when a later token changes
    tok = first.tokenizer
    with torch.no_grad():
        a = tok(["The quick brown fox"], return_tensors="pt").to(hf.device)
        b = tok(["The quick brown cat"], return_tensors="pt").to(hf.device)
        d = (hf(**a).last_hidden_state[0, 0] - hf(**b).last_hidden_state[0, 0]).abs().max()
    assert d > 1e-4, "Model is still causal — patch did not take effect."
    return model

Usage

The model is instruction-aware. Prepend a task instruction to each query using the format:

Instruct: {task instruction}\nQuery: {text}

Retrieval / reranking (asymmetric): instruct the query only; leave documents raw.
STS / classification / clustering / bitext (symmetric): instruct all texts.

model = load_minnow()

def with_instruction(instruction, texts):
    return [f"Instruct: {instruction}\nQuery: {t}" for t in texts]

# --- retrieval example ---
queries = with_instruction(
    "Given a query, retrieve documents that answer the query",
    ["What causes the northern lights?"],
)
docs = ["Auroras are produced when charged particles from the sun excite atoms in the upper atmosphere."]

q = model.encode(queries, normalize_embeddings=True)
d = model.encode(docs, normalize_embeddings=True)            # documents: no instruction
print((q @ d.T))

Model details


Base model	`Qwen/Qwen3-0.6B`
Parameters	~0.6B
Attention	Bidirectional (causal mask removed)
Pooling	Mean pooling
Embedding dim	1024
Max sequence length	512
Instruction-aware	Yes (`Instruct: … \nQuery: …`)
Similarity	Cosine

Training

Minnow-Em1 follows the now-standard multi-stage recipe for compact LLM-based embedders (cf. KaLM-Embedding-V2, Qwen3-Embedding, Llama-Embed-Nemotron):

Stage 1 — weakly-supervised contrastive pre-training. Large-scale query/passage pairs, in-batch negatives only, to adapt the bidirectional backbone to representation learning.
Stage 2 — supervised contrastive fine-tuning. Task-homogeneous batches with mined hard negatives, InfoNCE (temperature 0.02) with focal reweighting (γ = 0.5) to emphasize hard examples, false-negative masking, and symmetric/asymmetric instruction routing by task type.

Training data spans retrieval, STS, classification, clustering, reranking, pair classification, and bitext-mining sources across multiple languages.

Evaluation

Evaluation on the MMTEB / MTEB task suite is being finalized with the official mteb harness; a full results table will be added to this card in a subsequent revision. The model is optimized for the multilingual MMTEB task mix.

Numbers will only be published once produced by the official mteb package on the complete benchmark task set (not a partial or custom run).

Limitations and intended use

Bidirectional load required (see above) — without the patch the model is effectively causal and underperforms badly.
In-domain training data. The training mix includes the train splits of several public benchmark datasets (e.g. MS MARCO, HotpotQA, Natural Questions, NFCorpus, MIRACL). Scores on the corresponding evaluation tasks should be read as in-domain, not zero-shot.
Language balance. v1's fine-tuning mix is weighted toward English question-answering retrieval; performance on some low-resource and cross-lingual tasks is correspondingly weaker. Rebalancing is planned for a future version.
Intended for embedding/retrieval research and applications; not a generative model.

Acknowledgements

Built on Qwen/Qwen3-0.6B. Methodology informed by KaLM-Embedding-V2, Qwen3-Embedding, and Llama-Embed-Nemotron-8B. Evaluated with the MTEB / MMTEB benchmark suite.

License

Released under Apache-2.0, consistent with the Qwen/Qwen3-0.6B base model. Verify license compatibility for your use case before redistribution.

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for KiteFishAI/Minnow-Em1-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(976)

this model

Quantizations

1 model