Ureola 50M β Base
"She didn't quite turn out to be the best but I'm not done cooking π β honestly she speaks better English than most of us, with only 15 hours spent learning 59 thousand new words from scratch." β Neon, creator
What is Ureola?
Ureola is an open-source decoder-only language model built entirely from scratch by Neon of NeonTech β a one-man development team based in Port Harcourt, Nigeria.
No fine-tuned base. No borrowed weights. Every parameter in this model was initialized randomly and trained from zero.
This is Ureola 50M Base β the first checkpoint. She understands questions. She just doesn't always know how to answer them yet. The instruct version is coming.
Model Details
| Property | Value |
|---|---|
| Parameters | 49.9M |
| Architecture | Decoder-only Transformer |
| Layers | 8 |
| Attention heads | 8 |
| Embedding dim | 512 |
| FFN hidden dim | 2048 (SwiGLU) |
| Context length | 512 tokens |
| Positional encoding | RoPE (Rotary Position Embedding) |
| Normalization | RMSNorm (pre-norm) |
| Activation | SwiGLU |
| Weight tying | Yes (embedding β LM head) |
| Tokenizer | Meta LLaMA BPE (32,004 tokens) |
| Precision | float16 (trained), float32 (inference) |
| License | Apache 2.0 |
Training Details
| Property | Value |
|---|---|
| Dataset | OpenHermes 2.5 (200k samples) |
| Tokens seen | ~59 Million |
| Training steps | 20,000 |
| Training time | ~15 hours |
| Hardware | Tesla T4 16GB (Kaggle free tier) |
| Optimizer | AdamW (Ξ²1=0.9, Ξ²2=0.95) |
| Learning rate | 3e-4 with cosine decay |
| Warmup steps | 500 |
| Batch size | 32 Γ grad accumulation 4 = 128 effective |
| Final val loss | ~1.20 |
Architecture Highlights
Ureola uses a modern decoder-only transformer architecture with several design choices borrowed from state-of-the-art models:
- RoPE β Rotary positional embeddings for better length generalization
- SwiGLU β Gated activation function used in LLaMA, PaLM, and Mistral
- RMSNorm β Pre-normalization for stable training
- Weight tying β Embedding and LM head share weights, saving ~16M parameters
- No bias β Cleaner, faster linear layers
- Flash Attention β Via PyTorch's
scaled_dot_product_attention
The entire architecture was designed and implemented from scratch in PyTorch.
Chat Format
Ureola uses a simple chat template:
<|system|>
You are Ureola, a helpful and friendly AI assistant made by Neon of NeonTech.
<|user|>
Hello! Who are you?
<|assistant|>
Usage
import torch
from transformers import LlamaTokenizer
from safetensors.torch import load_file
# Load tokenizer
tokenizer = LlamaTokenizer.from_pretrained("Neon-tech/Ureola-50M-base")
tokenizer.add_special_tokens({
"additional_special_tokens": ["<|system|>", "<|user|>", "<|assistant|>", "<|end|>"]
})
# Load model weights
# (requires the UreolaMini architecture class from the model card)
weights = load_file("model.safetensors")
# see Spaces demo for full inference code
Honest Assessment
This is a base model trained on a single GPU for 15 hours. Here is what it can and cannot do:
Can:
- Generate coherent, grammatically correct English
- Follow conversational structure
- Produce structured responses (lists, steps, paragraphs)
- Understand the type of question being asked
Cannot (yet):
- Reliably answer specific questions accurately
- Follow instructions precisely
- Know who it is without being told
- Perform arithmetic or reasoning tasks
These limitations are expected for a 50M base model. The instruct fine-tuned version (coming soon) addresses instruction following directly.
What's Next
Ureola 50M Base β you are here
Ureola 50M Instruct β fine-tuned on GPT-4 quality instructions (coming soon)
Ureola 50M-T β thinking version with chain-of-thought (coming soon)
NeonTokenizer β custom BPE tokenizer for all future Ureola models
Ureola 100M β next scale (coming soon)
About
Neon is an independent developer and the founder of NeonTech, building open-source AI systems from Port Harcourt, Nigeria.
Ureola is the AI assistant powering Ureola β a general-purpose AI chat platform.
This model represents NeonTech's first step into open-weight language model development. Everything was built with limited compute, no institutional backing, and a lot of patience.
"All the stress. Everything. And I'm really proud of her." β Neon
Citation
@misc{ureola50m2025,
title = {Ureola 50M: A Decoder-only Language Model Trained from Scratch},
author = {Neon, NeonTech},
year = {2025},
url = {https://huggingface.co/Neon-tech/Ureola-50M-base}
}
Built from scratch. Trained on a free GPU. Made in Nigeria. π³π¬
- Downloads last month
- 210