e5-base-v2-code-search (v9-200k)

A fine-tuned code search embedding model based on intfloat/e5-base-v2 (110M parameters). Trained with call-graph false-negative filtering on 200K balanced pairs across 9 programming languages.

Built for cqs — code intelligence and RAG for AI agents.

Key Results

Eval Metric Score
Pipeline (55 confusable functions, enriched) R@1 94.5%
Pipeline MRR 0.966
Raw code embedding (no enrichment) R@1 70.9%
CodeSearchNet (6 languages) NDCG@10 0.615

The 94.5% pipeline score ties BGE-large (335M) at 1/3 the parameter count. The 70.9% raw R@1 exceeds BGE-large (61.8%).

Training

  • Base model: intfloat/e5-base-v2 (110M params, 768 dimensions)
  • Data: 200K balanced pairs (22,222 per language × 9 languages) from cqs-indexed Stack repos
  • Key technique: Call-graph false-negative filtering — uses code structure (caller/callee relationships) to exclude structurally related functions from contrastive negatives. Zero API cost (SQLite lookup).
  • Loss: CachedGISTEmbedLoss + MatryoshkaLoss (768/384/192/128 dims)
  • LoRA: rank 16, alpha 32, targets: query, key, value, dense
  • Epochs: 1 (more epochs degrades enrichment compatibility)
  • Dataset: jamie8johnson/cqs-code-search-200k

The 89.1% Basin

Six independent perturbations from this configuration (more data, less data, FAISS hard negatives, more epochs, contrastive query augmentation) all produce exactly -5.4pp pipeline regression to 89.1%. The 94.5% result appears to occupy a narrow peak in the loss landscape around ~22K examples per language with CG filtering.

Usage with cqs

# Default model in cqs v1.9.0+
cqs init && cqs index

# Or specify explicitly
export CQS_EMBEDDING_MODEL=e5-base
cqs index

Usage with sentence-transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jamie8johnson/e5-base-v2-code-search")
query_emb = model.encode("query: find functions that validate email addresses")
code_emb = model.encode("passage: def validate_email(addr): ...")

Languages

Go, Java, JavaScript, PHP, Python, Ruby, Rust, TypeScript, C++

ONNX

Includes model.onnx for inference with ONNX Runtime (used by cqs for local GPU/CPU inference).

Citation

Paper in preparation. See research log for methodology.

Downloads last month
208
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jamie8johnson/e5-base-v2-code-search

Adapter
(1)
this model

Dataset used to train jamie8johnson/e5-base-v2-code-search

Evaluation results

  • NDCG@10 (avg, 6 languages) on CodeSearchNet
    self-reported
    0.615
  • Pipeline R@1 on cqs Pipeline Eval (55 confusable functions)
    self-reported
    0.945
  • MRR on cqs Pipeline Eval (55 confusable functions)
    self-reported
    0.966
  • Raw R@1 (no enrichment) on cqs Pipeline Eval (55 confusable functions)
    self-reported
    0.709