--- license: apache-2.0 tags: - auron - chimera - gdn - ouroboros - hybrid-architecture language: - en thumbnail: auron_banner.png --- ![Auron](auron_banner.png) # Auron-1.1B (Archived — Scaling Wall) > **Note:** This model demonstrates a scaling limitation in Ouroboros weight sharing. Despite 4x more parameters than the 279M, it converges to nearly identical val_loss (**3.180** vs 3.188). At dim=2048 with head_dim=64, the representation is wide enough for a single pass — shared loops become an echo chamber rather than iterative refinement. > > **For inference and testing, use [Auron-510M](https://huggingface.co/nyxia/Auron-510M) (val_loss 3.035).** | Model | Params | Final Val Loss | Scaling | |-------|--------|---------------|---------| | Auron-279M | 279M | 3.188 | Baseline | | **[Auron-510M](https://huggingface.co/nyxia/Auron-510M)** | **510M** | **3.035** | **-0.153 (good)** | | Auron-1.1B | 1.1B | 3.180 | +0.145 (regression) | **Paper:** [Auron](https://github.com/Fy-/Auron) | **Code:** [github.com/Fy-/Auron](https://github.com/Fy-/Auron) | **Blog:** [HuggingFace](https://huggingface.co/blog/nyxia/auron) ## The Scaling Wall - **Root cause:** Representation saturation at dim=2048 — loops add no new information - **Contributing:** head_dim=64 produces 32 fragmented attention heads (Qwen 3.5 uses 256) - **Fix in progress:** Chimera 1B v2 (head_dim=128) + Chimera-MoE (routed experts) ## Architecture - **Type:** Chimera (6 bottom + 6×3 top = 24 virtual) - **Dim:** 2048, head_dim=64, expand_v=2 - **Params:** 1.1B (761M unique + 311M embed) - **Trained:** 250K steps, 5B tokens, WSD schedule ```python from ouro import load_model, generate model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M generate(model, tokenizer, device, "The history of") ``` Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of [Soulkyn](https://soulkyn.com).