Nandi-Mini-150M

Introduction

Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on 525 billion tokens and supports English and 10 Indic languages.

We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks.

Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments. Nandi-Mini-150M brings the following key features:

  • Strong multilingual capability across English and Indic languages
  • Efficient design enabling high performance at small scale (150M parameters)
  • Reduced memory footprint using factorized embeddings
  • Better parameter efficiency through layer sharing

📝 Upcoming Releases & Roadmap

We’re just getting started with the Nandi series 🚀

  • Nandi-Mini-150M (Base)Available now
  • Nandi-Mini-150M (Instruct) — Open Sourcing Next week
  • Nandi-Mini-500M (Base + Instruct) — Pre-Training Going On
  • Nandi-Mini-1B (Base + Instruct) — Pre-Training Going On

We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.

📢 Blogs & technical deep-dives coming soon, where we’ll share:

  • Architecture decisions and design trade-offs
  • Training insights and dataset composition
  • Benchmarks and real-world applications

Stay tuned!

This repo contains the base Nandi-Mini-150M model, which has the following features:

  • Type: Causal Language Model
  • Training Stage: Pretraining (from scratch)
  • Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, factorize embeddings
  • Number of Layers: 16*2 [Layer Sharing, effective layer =32]
  • Context Length: 2,048 tokens
  • Vocabulary Size: 131,072

🌍 Supported Languages

The model is trained on English and a diverse set of Indic languages, including:

  • Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia

Benchmark Results

📊 Benchmark Comparison (~150M Class)

Model Name Parameters Tokens(B) HellaSwag Winogrande GPQA MMLU GSM8K HumanEval Average
Mobile-LLM-125M 125 1000 38.90 53.10 - - - - -
SmolLM-135M-Base 135 600 42.66 53.03 25.44 25.30 1.36 0.00 24.63
SmolLM2-135M-Base 135 2000 43.13 53.27 22.09 24.09 1.74 0.00 24.05
Nandi-Mini-150M-Base 150 500 37.20 52.32 28.57 28.86 2.58 4.27 25.63

📊 Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class)

Model Name Parameters Tokens(B) HellaSwag Winogrande GPQA MMLU GSM8K HumanEval Average
Mobile-LLM-360M 350 1000 49.60 56.59 - - - - -
Qwen-2-0.5-Base 500 12000 49.01 57.69 27.23 44.06 10.61 22.56 35.19
Qwen2.5-0.5B-Base 500 18000 52.16 56.82 24.10 47.41 4.77 29.87 35.86
Qwen3-0.6B-Base 600 36000 53.77 59.19 30.80 50.34 15.31 28.04 39.58
SmolLM-360M-Base 360 600 53.33 57.22 21.20 24.92 2.19 1.21 26.68
SmolLM2-360M-Base 360 4000 56.30 59.19 25.22 25.55 2.88 0.00 28.19
Nandi-Mini-150M-Base 150 500 37.20 52.32 28.57 28.86 2.58 4.27 25.63

Note

Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using lm-eval under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.

Tokenization Fertility Score across Languages

Language SmolLM3-3B Qwen3-0.6B-Base Sarvam-30B Nandi-Mini-150M
English 1.17 1.16 1.18 1.18
Bengali 8.66 7.51 1.46 1.44
Gujarati 10.47 9.37 1.70 1.53
Hindi 2.71 5.14 1.23 1.32
Kannada 16.43 12.96 2.08 1.90
Malayalam 17.77 14.56 2.81 2.05
Marathi 3.73 6.70 1.77 1.55
Oriya 19.07 15.75 1.77 2.68
Punjabi 9.23 8.66 1.42 1.42
Tamil 13.56 10.93 2.35 2.05
Telugu 15.40 13.38 2.09 1.77
Assamese 9.26 8.13 2.38 1.51

🚀 Usage

!pip install transformers=='5.4.0'

from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "Rta-AILabs/Nandi-mini-150M"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",          
).eval()

prompt = """
The night was quiet and the streets were empty. 
A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
"""
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

outputs = model.generate(
    **model_inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.3,
    top_k=20,
    repetition_penalty=1.1,
    top_p=0.95
)

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
)

print(response)

📬 Feedback & Suggestions

We’d love to hear your thoughts, feedback, and ideas!

Downloads last month
2,857
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support