TF3 Student: Distilled Romanian Language Model

A compact 22.9M-parameter Romanian language model distilled from the TF3-50M teacher using logit-based knowledge distillation. Part of the TinyFabulist research project.

Model Details

Property	Value
Parameters	22.9M (26.45M with untied embeddings)
Architecture	LLaMA-style decoder-only Transformer
Hidden size	384
Attention heads	6 (head dim 64)
Layers	6
MLP intermediate	1,024
Vocab size	32,000 (Unigram, Romanian-specific)
Context length	2,048 tokens
Tied embeddings	Yes
Training	Knowledge distillation from klusai/tf3-50m-base

Training

Method: Logit-based knowledge distillation (KL + CE loss, alpha=0.009)
Teacher: klusai/tf3-50m-base (51.65M params, frozen)
Data: klusai/ds-tf2-en-ro-15k (15k Romanian fables)
Temperature: T=1.0
Epochs: 3
Learning rate: 3e-4 (cosine schedule, 50-step warmup)
Hardware: Apple M3 Ultra (96GB unified memory)

Intended Use

This model is a research artifact demonstrating knowledge distillation for compact Romanian language models trained on synthetic moral microfiction. It is designed for:

Research on compact language model compression
Romanian text generation in the fable/moral story domain
Downstream fine-tuning for Romanian NLP tasks

Not intended for: Production text generation, factual question answering, or safety-critical applications.

Limitations

Domain-restricted to moral microfiction (fables)
Trained exclusively on synthetic data
May exhibit repetitive patterns and simplified phrasing compared to the teacher
Gender agreement errors may occur in generated text

Citation

@article{nadas2026tf3,
  title={TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction},
  author={Nada\c{s}, Mihai Dan and Dio\c{s}an, Laura and Tomescu, Andreea and Pi\c{s}coran, Andrei},
  journal={arXiv preprint arXiv:2601.10410},
  year={2026}
}

Related Models and Datasets

Artifact	Description
klusai/tf3-50m-base	Teacher model (51.65M)
klusai/tf3-50m-sft	SFT-tuned teacher
klusai/tf3-bert	NER model for entity coherence evaluation
klusai/ds-tf2-en-ro-3m	3M bilingual fable corpus
klusai/ds-tf2-en-ro-15k	15k curated subset for distillation/SFT

Downloads last month: 629

Safetensors

Model size

22.9M params

Tensor type

F32

Model tree for klusai/tf3-26m-student

Base model

klusai/tf3-50m-base

Finetuned

(2)

this model

Dataset used to train klusai/tf3-26m-student

Paper for klusai/tf3-26m-student

TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction

Paper • 2601.10410 • Published Jan 15