SudokuDiT

A compact Diffusion Transformer (1.28 M params) that solves Sudoku as a masked discrete diffusion (MDLM-style) denoiser: it fills a puzzle by iteratively un-masking the cells it is most confident about, MaskGIT-style.

watch it solve

  • Architecture: DiT with adaLN-Zero conditioning — hidden=128, heads=4, blocks=4; per-cell token + 2-D positional + 3x3-box embeddings, plus a timestep.
  • Code, training and an interactive web demo: https://github.com/tchauffi/nonet

Input / output

A board is 81 tokens, row-major: 0 = empty / [MASK], 1..9 = digits. The solver clamps the given clues and only fills the blanks.

Usage

import torch
from nonet.hub import load_solver   # pip install git+https://github.com/tchauffi/nonet

solver = load_solver("tchauffi/sudoku-dit")     # downloads model.safetensors + config.json
puzzle = "417000800030005900800000000050000600000700020000000000000060054000200000000000003"
x = torch.tensor([[int(c) for c in puzzle]])      # (1, 81)
solution = solver.solve(x, conf_threshold=0.999)  # adaptive reveal (recommended)
# fixed-budget alternative: solver.solve(x, num_steps=81)

Performance

Held-out validation puzzles, adaptive decoder (tau = 0.999):

metric value
valid solutions 99.8 %
exact match vs reference 94.3 %
avg reveal steps / puzzle 3.6

Solve rate is 100 % up to ~39 blanks and ~98 % on the hard 50+ blank tail (where the adaptive decoder spends more steps). exact_match is lower than valid only because the dataset's high-blank puzzles aren't always uniquely solvable, so the model may return a different valid grid.

Training

  • Data: Ritvik19/Sudoku-Dataset (~17 M puzzles), tokenized on the fly.
  • Objective: masked cross-entropy over masked cells, conditional (given clues are never masked), with a linear masking schedule.
  • ~50 k steps, AdamW, batch 256. Kept deliberately small — larger models sit at the entropy floor far longer before the loss breaks through.

Limitations

  • Not a guaranteed solver: a small fraction of very hard (60+ blank) boards come out invalid.
  • Trained only on standard 9x9 Sudoku from the dataset above.
Downloads last month
-
Safetensors
Model size
1.28M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train tchauffi/sudoku-dit