nyxia
/

Auron-510M

hybrid-architecture

Model card Files Files and versions

Auron-510M / README.md

nyxia's picture

Upload Chimera 510M at step 249000

d498dc3 verified 2 months ago

|

history blame contribute delete

1.97 kB

	---
	license: apache-2.0
	tags:
	- auron
	- chimera
	- gdn
	- ouroboros
	- hybrid-architecture
	language:
	- en
	thumbnail: auron_banner.png
	---

	![Auron](auron_banner.png)

	# Auron-510M

	Auron — Chimera hybrid GDN-Attention language models with Ouroboros weight sharing.

	Paper: [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron)
	Code: [github.com/Fy-/Auron](https://github.com/Fy-/Auron)

	## Architecture
	- Type: Chimera (ChimeraConfig)
	- Dim: 1536
	- Layers: 16 virtual
	- Params: 510,217,280 (510M)
	- Vocab: 151936 (Qwen 3 tokenizer)
	- Context: 2048 tokens
	- Topology: 4 unique bottom + 4×3 shared top
	- GDN:Attn ratio: 3:1 (every 4th layer is attention)
	- Virtual equivalent: ~1,020,434,560 params

	## Training Curves

	![Training Curves](training_curves.png)

	## Training
	- Step: 249,000
	- Data: Mixed (75% FineWeb-Edu, 18% StarCoder, 5% FineMath, 2% UltraChat)
	- Optimizer: Muon + AdamW (decoupled embedding LR)
	- Schedule: WSD (Warmup-Stable-Decay)

	## Usage

	```bash
	git clone https://github.com/Fy-/Auron && cd Auron && rye sync
	```

	```python
	from ouro import load_model, generate

	model, tokenizer, device = load_model("nyxia/Auron-510M")
	generate(model, tokenizer, device, "The history of")
	```

	## Sampling

	Default: T=0.7, top_k=20, top_p=0.9, rep_pen=1.0, presence_pen=1.5 (Ouroboros weight sharing requires presence penalty >= 1.5 to prevent attractor wells).

	## Links

	- Paper: [Auron: Depth-Efficient Language Models via Hybrid Recurrent-Attention Weight Sharing](https://github.com/Fy-/Auron/blob/master/Auron_chimera_topology_paper.pdf)
	- Code: [github.com/Fy-/Auron](https://github.com/Fy-/Auron)
	- Models: [huggingface.co/nyxia](https://huggingface.co/nyxia)

	Built by [Florian Gasquez](https://fyx.jp) ([@nyxia](https://huggingface.co/nyxia)). Part of the [Soulkyn](https://soulkyn.com) project.