Nandi-Mini-150M

Introduction

Nandi-Mini-150M is a compact, efficient multilingual language model designed for strong performance in resource-constrained environments. It is pre-trained from scratch on 525 billion tokens and supports English and 10 Indic languages.

We do not employ any benchmaxing tricks; the model is designed to be genuinely strong and highly effective for fine-tuning on downstream tasks.

Nandi-Mini-150M focuses on maximizing performance per parameter through architectural efficiency rather than scale. It is optimized for edge devices, on-prem deployments, and low-latency applications, making it ideal for resource-constrained environments. Nandi-Mini-150M brings the following key features:

Strong multilingual capability across English and Indic languages
Efficient design enabling high performance at small scale (150M parameters)
Reduced memory footprint using factorized embeddings
Better parameter efficiency through layer sharing

📝 Upcoming Releases & Roadmap

We’re just getting started with the Nandi series 🚀

Nandi-Mini-150M (Base) — Available now
Nandi-Mini-150M (Instruct) — Open Sourcing Next week
Nandi-Mini-500M (Base + Instruct) — Pre-Training Going On
Nandi-Mini-1B (Base + Instruct) — Pre-Training Going On

We are actively working on expanding the Nandi family to cover a wider range of use cases—from lightweight edge deployments to more capable instruction-tuned systems.

📢 Blogs & technical deep-dives coming soon, where we’ll share:

Architecture decisions and design trade-offs
Training insights and dataset composition
Benchmarks and real-world applications

Stay tuned!

This repo contains the base Nandi-Mini-150M model, which has the following features:

Type: Causal Language Model
Training Stage: Pretraining (from scratch)
Architecture: Transformer decoder with RoPE, RMSNorm, SwiGLU, GQA, tied embeddings, factorize embeddings
Number of Layers: 16*2 [Layer Sharing, effective layer =32]
Context Length: 2,048 tokens
Vocabulary Size: 131,072

🌍 Supported Languages

The model is trained on English and a diverse set of Indic languages, including:

Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia

Benchmark Results

📊 Benchmark Comparison (~150M Class)

Model Name	Parameters	Tokens(B)	HellaSwag	Winogrande	GPQA	MMLU	GSM8K	HumanEval	Average
Mobile-LLM-125M	125	1000	38.90	53.10	-	-	-	-	-
SmolLM-135M-Base	135	600	42.66	53.03	25.44	25.30	1.36	0.00	24.63
SmolLM2-135M-Base	135	2000	43.13	53.27	22.09	24.09	1.74	0.00	24.05
Nandi-Mini-150M-Base	150	500	37.20	52.32	28.57	28.86	2.58	4.27	25.63

📊 Model Benchmark Comparison With Slightly Bigger Models (350M–600M Class)

Model Name	Parameters	Tokens(B)	HellaSwag	Winogrande	GPQA	MMLU	GSM8K	HumanEval	Average
Mobile-LLM-360M	350	1000	49.60	56.59	-	-	-	-	-
Qwen-2-0.5-Base	500	12000	49.01	57.69	27.23	44.06	10.61	22.56	35.19
Qwen2.5-0.5B-Base	500	18000	52.16	56.82	24.10	47.41	4.77	29.87	35.86
Qwen3-0.6B-Base	600	36000	53.77	59.19	30.80	50.34	15.31	28.04	39.58
SmolLM-360M-Base	360	600	53.33	57.22	21.20	24.92	2.19	1.21	26.68
SmolLM2-360M-Base	360	4000	56.30	59.19	25.22	25.55	2.88	0.00	28.19
Nandi-Mini-150M-Base	150	500	37.20	52.32	28.57	28.86	2.58	4.27	25.63

Note

Mobile-LLM model checkpoints are not publicly available; their results are reported directly from the original paper. All other models have been evaluated using lm-eval under a consistent setup. Human-Eval & GSM8K have been evaluated using Greedy-decoding for now for all models.

Tokenization Fertility Score across Languages

Language	SmolLM3-3B	Qwen3-0.6B-Base	Sarvam-30B	Nandi-Mini-150M
English	1.17	1.16	1.18	1.18
Bengali	8.66	7.51	1.46	1.44
Gujarati	10.47	9.37	1.70	1.53
Hindi	2.71	5.14	1.23	1.32
Kannada	16.43	12.96	2.08	1.90
Malayalam	17.77	14.56	2.81	2.05
Marathi	3.73	6.70	1.77	1.55
Oriya	19.07	15.75	1.77	2.68
Punjabi	9.23	8.66	1.42	1.42
Tamil	13.56	10.93	2.35	2.05
Telugu	15.40	13.38	2.09	1.77
Assamese	9.26	8.13	2.38	1.51

🚀 Usage

!pip install transformers=='5.4.0'

from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "Rta-AILabs/Nandi-mini-150M"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map="auto",          
).eval()

prompt = """
The night was quiet and the streets were empty. 
A single light flickered in the distance. Someone was walking slowly, carrying a small bag. Suddenly,
"""
model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

outputs = model.generate(
    **model_inputs,
    max_new_tokens=50,
    do_sample=True,
    temperature=0.3,
    top_k=20,
    repetition_penalty=1.1,
    top_p=0.95
)

response = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
)

print(response)

📬 Feedback & Suggestions

We’d love to hear your thoughts, feedback, and ideas!

Email: support@rtaailabs.com
Official Website https://rtaailabs.com/
LinkedIn: https://www.linkedin.com/company/rta-ai-lab
X (Twitter): https://x.com/Rta_AILabs

Downloads last month: 2,857

Safetensors

Model size

0.2B params

Tensor type

BF16