Ritvik19/Sudoku-Dataset
Viewer • Updated • 17M • 952 • 13
A compact Diffusion Transformer (1.28 M params) that solves Sudoku as a masked discrete diffusion (MDLM-style) denoiser: it fills a puzzle by iteratively un-masking the cells it is most confident about, MaskGIT-style.
hidden=128, heads=4,
blocks=4; per-cell token + 2-D positional + 3x3-box embeddings, plus a timestep.A board is 81 tokens, row-major: 0 = empty / [MASK], 1..9 = digits. The solver
clamps the given clues and only fills the blanks.
import torch
from nonet.hub import load_solver # pip install git+https://github.com/tchauffi/nonet
solver = load_solver("tchauffi/sudoku-dit") # downloads model.safetensors + config.json
puzzle = "417000800030005900800000000050000600000700020000000000000060054000200000000000003"
x = torch.tensor([[int(c) for c in puzzle]]) # (1, 81)
solution = solver.solve(x, conf_threshold=0.999) # adaptive reveal (recommended)
# fixed-budget alternative: solver.solve(x, num_steps=81)
Held-out validation puzzles, adaptive decoder (tau = 0.999):
| metric | value |
|---|---|
| valid solutions | 99.8 % |
| exact match vs reference | 94.3 % |
| avg reveal steps / puzzle | 3.6 |
Solve rate is 100 % up to ~39 blanks and ~98 % on the hard 50+ blank tail (where the adaptive
decoder spends more steps). exact_match is lower than valid only because the dataset's
high-blank puzzles aren't always uniquely solvable, so the model may return a different
valid grid.
Ritvik19/Sudoku-Dataset
(~17 M puzzles), tokenized on the fly.