Auron-279M / README.md
nyxia's picture
Archive: scaling wall, redirect to 510M
481c01e verified
---
license: apache-2.0
tags:
- auron
- chimera
- gdn
- ouroboros
- hybrid-architecture
language:
- en
thumbnail: auron_banner.png
---
![Auron](auron_banner.png)
# Auron-279M (Archived)
> **Note:** This model is part of a scaling study. The 279M achieved a final val_loss of **3.188** — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The **510M** model is the best-performing Chimera variant.
>
> **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).**
| Model | Params | Final Val Loss | Status |
|-------|--------|---------------|--------|
| Auron-279M | 279M | 3.188 | Archived |
| **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **Best** |
| Auron-1.1B | 1.1B | 3.180 | Archived |
**Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron)
## Architecture
- **Type:** Chimera (4 bottom + 4×3 top = 16 virtual)
- **Dim:** 1024, head_dim=64, expand_v=2
- **Params:** 279M (123M unique + 155M embed)
- **Trained:** 250K steps, 5B tokens, WSD schedule
```python
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M
generate(model, tokenizer, device, "The history of")
```
Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com).