8 13 315

saint marzi

ausntmarzi

AI & ML interests

None yet

Recent Activity

upvoted an article about 4 hours ago

FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods

liked a Space about 7 hours ago

FINAL-Bench/quantum-bench-leaderboard

reacted to SeaWolf-AI's post with 🔥 about 7 hours ago

🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick. Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source. Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation The rules are simple and strict: ✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable). 🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made. 🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits. 📤 Anyone can submit their own method via the Submit tab for review and listing. Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware. A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim. Leaderboard: https://huggingface.co/spaces/FINAL-Bench/quantum-bench-leaderboard Article: https://huggingface.co/blog/FINAL-Bench/quantum-leaderboard #quantum #QEC #QuantumComputing #benchmark

View all activity

Organizations

None yet

upvoted an article about 4 hours ago

Article

FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods

FINAL-Bench

•

about 7 hours ago

• 15

liked a Space about 7 hours ago

FINAL-Bench Quantum Leaderboard

⚛

Neutral quantum-method benchmark — QEC decoders & more

reacted to SeaWolf-AI's post with 🔥 about 7 hours ago

Post

1122

🚀 Introducing FINAL-Bench Quantum — an open, neutral benchmark that finally puts quantum-computing methods on one fair yardstick.

Quantum results are notoriously hard to compare. The same "logical error rate" or "query fidelity" means very different things depending on the code, noise model, hardware, and shot count. FINAL-Bench Quantum fixes that: five events judged under identical, published protocols, where every number is labeled as either measured here or quoted from a source.

Five events: ① QEC Decoder ② Optimization (Max-Cut) ③ VQE ④ QRAM ⑤ Quantum Simulation

The rules are simple and strict:
✅ Track A (measured here, with 95% confidence intervals) is kept separate from Track B (quoted from papers, not directly comparable).
🔬 Simulation and real hardware are clearly distinguished, and no quantum-advantage claims are made.
🌍 Methods from Google, IBM, NVIDIA, USTC, Riverlane and more sit side by side, with origin flags and author credits.
📤 Anyone can submit their own method via the Submit tab for review and listing.

Already on the board: real IBM Heron r2 measurements (repetition-code distance boundary, 29–175× error reduction from d3 to d5), a real-chip QRAM query fidelity of 0.92, and H₂ VQE at chemical accuracy — always labeled honestly as simulation vs hardware.

A leaderboard is only useful if you can trust it, so neutrality is the whole point: strong competitors stay in even when they beat the host, sources are quoted faithfully, and a simulation is never rounded up into a hardware claim.

Leaderboard: FINAL-Bench/quantum-bench-leaderboard
Article: https://huggingface.co/blog/FINAL-Bench/quantum-leaderboard

#quantum #QEC #QuantumComputing #benchmark

liked a model 1 day ago

FINAL-Bench/Darwin-28B-Coder-GGUF

Text Generation • 27B • Updated 1 day ago • 176 • 19

liked a model 7 days ago

JGOS-Model/JGOS-31B-Citizen

Image-Text-to-Text • 31B • Updated 6 days ago • 260 • 19

reacted to SeaWolf-AI's post with 🔥 17 days ago

Post

4242

Darwin-60B-DUO: Two SOTAs, One Endpoint — 88.38% on GPQA Diamond 🚀

We're excited to release Darwin-60B-DUO, the Darwin family's first DUO model. Take two domain-verified specialists, hide them behind a single OpenAI-compatible endpoint, and let a router decide which one (or both) answers. You see one model, one API — but get the best of both.

The number that matters: on the full 198-question GPQA Diamond, Darwin-60B-DUO hits 88.38%. The constituents alone land at 69.70% (Darwin-28B-REASON) and 77.27% (AWAXIS-Think-31B); a naive cascade only reaches 83.84%. The DUO clears them all. Two small specialists, intelligently routed, beat one big generalist on cost and quality. Both are independently verified — Darwin-28B-REASON is #3 on the HF GPQA Diamond leaderboard, AWAXIS-Think-31B is #1 on Korea's national K-AI Leaderboard (MSIT).

The brains is a Hybrid-A router picking one of five strategies on the fly. Korean → AWAXIS, English/STEM → Darwin (single-backend, ~70% of traffic at 1× cost). When a Korean answer needs rigorous English reasoning, split_refine fires — Darwin drafts, AWAXIS polishes; MCQ/short-answer runs both with self-consistency + cross-verify. Net effective cost: only ~1.3× a single 30B model.

The part the community will care about: the gateway is model-agnostic and Apache-2.0. Point it at any two OpenAI-compatible backends and you've got a DUO in minutes — teach router.py when to use which, and parallel calls, response merging, and routing transparency via _duo_route are handled for you. Fork it and tell us what you built.

Painless deploy: docker compose up for both vLLM backends + gateway; FP8 ~30GB colocates on a single B200/H100. One git clone (~120GB). Text-only for now, streaming in v1.1.
Two SOTAs, one endpoint. Come build your own on the Community tab.

👇
🔗 FINAL-Bench/Darwin-60B-DUO

liked a model 17 days ago

FINAL-Bench/Darwin-60B-DUO

Text Generation • Updated 11 days ago • 465 • 31

liked 2 models 24 days ago

FINAL-Bench/Darwin-28B-Coder

Text Generation • 27B • Updated 25 days ago • 881 • 20

ansulev/Darwin-9B-NEG

Text Generation • 10B • Updated 27 days ago • 262k • 15

liked a model 28 days ago

FINAL-Bench/Darwin-28B-REASON

Text Generation • 27B • Updated 1 day ago • 705 • 33

liked a Space 29 days ago

Darwin 9B NEG

🧬

Darwin-9B-NEG reasoning model — API-served chat demo

reacted to SeaWolf-AI's post with ❤️ about 1 month ago

Post

5425

🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted — sharing the core ideas below.

🔗 Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
🔗 arXiv: https://arxiv.org/abs/2605.14386
🔗 Model: FINAL-Bench/Darwin-28B-REASON
🔗 Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints — with zero
gradient-based training — it reaches frontier-level reasoning.

- 🏆 Darwin-28B-Opus: GPQA Diamond 88.89%
- 💸 Zero gradient steps — not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B → 35B scale
- 🔀 Cross-architecture breeding between Transformer and Mamba families
- 🔁 Stable recursive multi-generation evolution

#Three Core Mechanisms

① 14-dim Adaptive Merge Genome — fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

② MRI-Trust Fusion — we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient — Darwin learns the balance from data.

③ Architecture Mapper — weight-space breeding across heterogeneous
families. Attention × SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them — no gradients required.

Replies and critiques welcome 🙌

3 replies

upvoted a paper about 1 month ago

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Paper • 2605.14386 • Published May 14 • 62

liked a model about 1 month ago

FINAL-Bench/Darwin-28B-KR-Legal

Text Generation • 27B • Updated about 1 month ago • 99 • 14

liked a Space about 2 months ago

Model Galaxy

🌌

Darwin family + 2026 trending models on the HF galaxy

reacted to SeaWolf-AI's post with 👍 about 2 months ago

Post

5096

🌌 Introducing Model Galaxy — a Living, Multimodal Fork of the HF Model Atlas

👉 Try it: FINAL-Bench/model-galaxy

This Space is a fork of the brilliant Eliahu/Model-Atlas, the official demo of "Charting and Navigating Hugging Face's Model Atlas" (Horwitz et al., arXiv 2503.10633). Their pre-computed HF model graph is the foundation of every node and edge you see, and we are deeply grateful for its open release.

The original atlas is a static snapshot of early 2025. Model Galaxy turns it into a living, multimodal map. We injected the 2026 trending originals that did not exist when the atlas was frozen — DeepSeek-V4, Hy3-preview, GLM-5.1, Kimi-K2, gpt-oss, Nemotron-3 Super / Nano / Omni, Hermes-4.3, Qwen3-Coder-Next, Llama-3.3, Granite-4.1, plus the latest multimodal releases (FLUX.2, ERNIE-Image, HunyuanImage / Video, LTX-2.3, Wan2.2, Kokoro-82M, VoxCPM2, Voxtral-TTS, whisper-v3-turbo, Gemma-4, Qwen3-Omni, Phi-4-mm) — each with proper base_model lineage edges.

We also added the complete VIDRAFT Darwin family ontology: 120 nodes covering Darwin Core, AETHER, every brand variant (Rogue, AWAXIS, TenOS, Warecube), NOESIS-Darwin multimodal extensions, and 40+ community quantizations — the most complete Darwin lineage view anywhere.

The name "Galaxy" is now literal: our three injected clusters are re-laid out as logarithmic spiral galaxies, with bigger models near the bright cores and quantizations scattering to the outer arms — just like real star mass distribution. A top-right toggle switches between Galaxy mode (deep-space gradient with 220 animated stars) and Atlas mode (clean white panels for reports). A 15-second progress bar narrates the render, and per-modality / per-company colors make every cluster legible at a glance.

Final scale: 22,480 nodes in the default Modalities atlas, 137,324 in the Large NLP atlas, and a 277-node compact Darwin + Trending view for instant exploration. Feedback and PRs welcome.

liked 4 models about 2 months ago

saint marzi

AI & ML interests

Recent Activity

Organizations

ausntmarzi's activity

FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods

FINAL-Bench Quantum Leaderboard

Darwin 9B NEG

Model Galaxy