FINAL-Bench Quantum: An Open, Neutral Benchmark for Quantum-Computing Methods
π Leaderboard: huggingface.co/spaces/FINAL-Bench/quantum-bench-leaderboard
The core rule β two tracks
- Track A (Verified). Methods are measured here on one frozen, public test set and reported with 95% confidence intervals. These numbers are directly comparable.
- Track B (Reported). Numbers quoted from each paper or announcement. Codes, noise models, and hardware differ, so they are not directly comparable β and we say so plainly.
Two principles hold throughout:
- No quantum-advantage claims.
- A simulation is labeled a simulation; real hardware is named with its chip. When two results fall within each other's confidence intervals, we call it a statistical tie rather than crowning a winner.
The five events
| Event | What it measures | One-line analogy |
|---|---|---|
| β QEC Decoder | logical error rate on a rotated surface code (Stim, circuit noise) | accuracy of a quantum "spell-checker" |
| β‘ Optimization | Max-Cut quality (cut found / optimum) | finding the best answer among astronomically many |
| β’ VQE | molecular ground-state energy vs the exact solution | quantum energy calculation for chemistry/drugs |
| β£ QRAM | quantum-memory query fidelity | accuracy of a quantum "memory chip" |
| β€ Simulation | how large a circuit a classical method can handle | faking a quantum computer on a classical one |
Each event tab is organized as A. verified measurements / B. real hardware (where available) / C. published references, alongside dedicated π Charts (threshold, distance-scaling, and latency-vs-accuracy plots), π Medals (participation by country), and βΉοΈ About (methodology and citation).
How to read the tables
- Flag = a method's / team's origin; By = its authors (e.g., Tesseract = Google Quantum AI, PyMatching = O. Higgott).
- β VERIFIED = measured on this benchmark. REPORTED = quoted from a source.
- The Β±value next to a number is its 95% confidence interval. Overlapping intervals mean a tie.
- The latency column matters as much as accuracy β a decoder that is "accurate but slow" can be useless for real-time error correction, where decoding must keep pace with the QPU cycle.
How to submit
The π€ Submit tab takes a method name, links (GitHub / Hugging Face), an email, and an optional results file. Submissions are stored privately, reproduced under the event's fixed protocol, and the submitter is emailed about inclusion. Listed entries appear with the same origin and author labels as everyone else.
Why neutrality is the whole point
A leaderboard is only useful if you can trust it. So FINAL-Bench Quantum is built on a discipline: include strong competitors even when they beat the host's own entries, quote sources faithfully, and never round a simulation up into a hardware claim. Methods from Google, IBM, NVIDIA, USTC, Riverlane and others sit next to a Korean entry (VIDRAFT π°π·) under the same protocol, confidence intervals, and honesty boundaries.
Quantum computing has not yet reached the fault-tolerant era β which is exactly why a shared, hype-free yardstick that honestly records what has actually been measured today is worth building. Come and compete with your own method.
A methods paper is in preparation. Feedback and submissions are welcome.
