| --- |
| license: apache-2.0 |
| tags: |
| - auron |
| - chimera |
| - gdn |
| - ouroboros |
| - hybrid-architecture |
| language: |
| - en |
| thumbnail: auron_banner.png |
| --- |
| |
|  |
|
|
| # Auron-510M |
|
|
| **Auron** — Chimera hybrid GDN-Attention language models with Ouroboros weight sharing. |
|
|
| **Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron) |
| **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) |
|
|
| ## Architecture |
| - **Type:** Chimera (ChimeraConfig) |
| - **Dim:** 1536 |
| - **Layers:** 16 virtual |
| - **Params:** 510,217,280 (510M) |
| - **Vocab:** 151936 (Qwen 3 tokenizer) |
| - **Context:** 2048 tokens |
| - **Topology:** 4 unique bottom + 4×3 shared top |
| - **GDN:Attn ratio:** 3:1 (every 4th layer is attention) |
| - **Virtual equivalent:** ~1,020,434,560 params |
|
|
| ## Training Curves |
|
|
|  |
|
|
| ## Training |
| - **Step:** 249,000 |
| - **Data:** Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat) |
| - **Optimizer:** Muon + AdamW (decoupled embedding LR) |
| - **Schedule:** WSD (Warmup-Stable-Decay) |
|
|
| ## Usage |
|
|
| ```bash |
| git clone https://github.com/Fy-/Auron && cd Auron && rye sync |
| ``` |
|
|
| ```python |
| from ouro import load_model, generate |
| |
| model, tokenizer, device = load_model("nyxia/Auron-510M") |
| generate(model, tokenizer, device, "The history of") |
| ``` |
|
|
| ## Sampling |
|
|
| Default: T=0.7, top_k=20, top_p=0.9, rep_pen=1.0, presence_pen=1.5 (Ouroboros weight sharing requires presence penalty >= 1.5 to prevent attractor wells). |
|
|
| ## Links |
|
|
| - **Paper:** [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron/blob/master/Auron_chimera_topology_paper.pdf) |
| - **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) |
| - **Models:** [huggingface.co/nyxia](https://huggingface.co/nyxia) |
|
|
| Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of the [Soulkyn](https://soulkyn.com) project. |
|
|