---
license: apache-2.0
tags:
  - auron
  - chimera
  - gdn
  - ouroboros
  - hybrid-architecture
language:
  - en
thumbnail: auron_banner.png
---

![Auron](auron_banner.png)

# Auron-1.1B (Archived — Scaling Wall)

> **Note:** This model demonstrates a scaling limitation in Ouroboros weight sharing. Despite 4x more parameters than the 279M, it converges to nearly identical val_loss (**3.180** vs 3.188). At dim=2048 with head_dim=64, the representation is wide enough for a single pass — shared loops become an echo chamber rather than iterative refinement.
>
> **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).**

| Model | Params | Final Val Loss | Scaling |
|-------|--------|---------------|---------|
| Auron-279M | 279M | 3.188 | Baseline |
| **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **-0.153 (good)** |
| Auron-1.1B | 1.1B | 3.180 | +0.145 (regression) |

**Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron)

## The Scaling Wall
- **Root cause:** Representation saturation at dim=2048 — loops add no new information
- **Contributing:** head_dim=64 produces 32 fragmented attention heads (Qwen 3.5 uses 256)
- **Fix in progress:** Chimera 1B v2 (head_dim=128) + Chimera-MoE (routed experts)

## Architecture
- **Type:** Chimera (6 bottom + 6×3 top = 24 virtual)
- **Dim:** 2048, head_dim=64, expand_v=2
- **Params:** 1.1B (761M unique + 311M embed)
- **Trained:** 250K steps, 5B tokens, WSD schedule

```python
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")
```

Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com).