Nandi-Mini-150M
Introduction
Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on 525 billion tokens and supports English and 10 Indic languages.
We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks.
Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments. Nandi-Mini-150M brings the following key features:
- Strong multilingual capability across English and Indic languages
- Efficient design enabling high performance at small scale (150M parameters)
- Reduced memory footprint using factorized embeddings
- Better parameter efficiency through layer sharing
📝 Upcoming Releases & Roadmap
We’re just getting started with the Nandi series 🚀
- Nandi-Mini-150M (Base) — Available now
- Nandi-Mini-150M (Instruct) — Open Sourcing Next week
- Nandi-Mini-500M (Base + Instruct) — Pre-Training Going On
- Nandi-Mini-1B (Base + Instruct) — Pre-Training Going On
We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.
📢 Blogs & technical deep-dives coming soon, where we’ll share:
- Architecture decisions and design trade-offs
- Training insights and dataset composition
- Benchmarks and real-world applications
Stay tuned!
This repo contains the base Nandi-Mini-150M model, which has the following features:
- Type: Causal Language Model
- Training Stage: Pretraining (from scratch)
- Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, factorize embeddings
- Number of Layers: 16*2 [Layer Sharing, effective layer =32]
- Context Length: 2,048 tokens
- Vocabulary Size: 131,072
🌍 Supported Languages
The model is trained on English and a diverse set of Indic languages, including:
- Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia
Benchmark Results
📊 Benchmark Comparison (~150M Class)
| Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
|---|---|---|---|---|---|---|---|---|---|
| Mobile-LLM-125M | 125 | 1000 | 38.90 | 53.10 | - | - | - | - | - |
| SmolLM-135M-Base | 135 | 600 | 42.66 | 53.03 | 25.44 | 25.30 | 1.36 | 0.00 | 24.63 |
| SmolLM2-135M-Base | 135 | 2000 | 43.13 | 53.27 | 22.09 | 24.09 | 1.74 | 0.00 | 24.05 |
| Nandi-Mini-150M-Base | 150 | 500 | 37.20 | 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 |
📊 Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class)
| Model Name | Parameters | Tokens(B) | HellaSwag | Winogrande | GPQA | MMLU | GSM8K | HumanEval | Average |
|---|---|---|---|---|---|---|---|---|---|
| Mobile-LLM-360M | 350 | 1000 | 49.60 | 56.59 | - | - | - | - | - |
| Qwen-2-0.5-Base | 500 | 12000 | 49.01 | 57.69 | 27.23 | 44.06 | 10.61 | 22.56 | 35.19 |
| Qwen2.5-0.5B-Base | 500 | 18000 | 52.16 | 56.82 | 24.10 | 47.41 | 4.77 | 29.87 | 35.86 |
| Qwen3-0.6B-Base | 600 | 36000 | 53.77 | 59.19 | 30.80 | 50.34 | 15.31 | 28.04 | 39.58 |
| SmolLM-360M-Base | 360 | 600 | 53.33 | 57.22 | 21.20 | 24.92 | 2.19 | 1.21 | 26.68 |
| SmolLM2-360M-Base | 360 | 4000 | 56.30 | 59.19 | 25.22 | 25.55 | 2.88 | 0.00 | 28.19 |
| Nandi-Mini-150M-Base | 150 | 500 | 37.20 | 52.32 | 28.57 | 28.86 | 2.58 | 4.27 | 25.63 |
Note
Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using lm-eval under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.
Tokenization Fertility Score across Languages
| Language | SmolLM3-3B | Qwen3-0.6B-Base | Sarvam-30B | Nandi-Mini-150M |
|---|---|---|---|---|
| English | 1.17 | 1.16 | 1.18 | 1.18 |
| Bengali | 8.66 | 7.51 | 1.46 | 1.44 |
| Gujarati | 10.47 | 9.37 | 1.70 | 1.53 |
| Hindi | 2.71 | 5.14 | 1.23 | 1.32 |
| Kannada | 16.43 | 12.96 | 2.08 | 1.90 |
| Malayalam | 17.77 | 14.56 | 2.81 | 2.05 |
| Marathi | 3.73 | 6.70 | 1.77 | 1.55 |
| Oriya | 19.07 | 15.75 | 1.77 | 2.68 |
| Punjabi | 9.23 | 8.66 | 1.42 | 1.42 |
| Tamil | 13.56 | 10.93 | 2.35 | 2.05 |
| Telugu | 15.40 | 13.38 | 2.09 | 1.77 |
| Assamese | 9.26 | 8.13 | 2.38 | 1.51 |
🚀 Usage
!pip install transformers=='5.4.0'
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Rta-AILabs/Nandi-mini-150M"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
device_map="auto",
).eval()
prompt = """
The night was quiet and the streets were empty.
A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
"""
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
**model_inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.3,
top_k=20,
repetition_penalty=1.1,
top_p=0.95
)
response = tokenizer.decode(
outputs[0],
skip_special_tokens=True,
)
print(response)
📬 Feedback & Suggestions
We’d love to hear your thoughts, feedback, and ideas!
- Email: support@rtaailabs.com
- Official Website https://rtaailabs.com/
- LinkedIn: https://www.linkedin.com/company/rta-ai-lab
- X (Twitter): https://x.com/Rta_AILabs
- Downloads last month
- 2,857