| --- |
| license: apache-2.0 |
| tags: |
| - auron |
| - chimera |
| - gdn |
| - ouroboros |
| - hybrid-architecture |
| language: |
| - en |
| thumbnail: auron_banner.png |
| --- |
| |
|  |
|
|
| # Auron-279M (Archived) |
|
|
| > **Note:** This model is part of a scaling study. The 279M achieved a final val_loss of **3.188** — virtually identical to the 4x larger 1.1B model (3.180), revealing a scaling wall in the Ouroboros weight-sharing mechanism. The **510M** model is the best-performing Chimera variant. |
| > |
| > **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).** |
| |
| | Model | Params | Final Val Loss | Status | |
| |-------|--------|---------------|--------| |
| | Auron-279M | 279M | 3.188 | Archived | |
| | **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **Best** | |
| | Auron-1.1B | 1.1B | 3.180 | Archived | |
| |
| **Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron) |
| |
| ## Architecture |
| - **Type:** Chimera (4 bottom + 4×3 top = 16 virtual) |
| - **Dim:** 1024, head_dim=64, expand_v=2 |
| - **Params:** 279M (123M unique + 155M embed) |
| - **Trained:** 250K steps, 5B tokens, WSD schedule |
| |
| ```python |
| from ouro import load_model, generate |
| model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M |
| generate(model, tokenizer, device, "The history of") |
| ``` |
| |
| Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com). |
| |