WiLoR-MLX: Hand Pose Estimation on Apple Silicon

MLX port of WiLoR-mini for native Apple Silicon inference. Complete pipeline: ViT-H/16 backbone + MANO hand model + RefineNet refinement.

Code: github.com/lyonsno/wilor-mlx

Available Weights

Variant	File	Size	Precision	Notes
float32	`wilor-mlx.safetensors`	2.4 GB	Full	Reference quality, recommended
int4	`wilor-mlx-int4.safetensors`	490 MB	4-bit quantized	5x smaller download, same speed

Both variants produce near-identical inference speed on Apple Silicon (see benchmarks below). Choose based on download size and precision needs.

These weights contain only ViT backbone, RefineNet, and learned embedding parameters — no MANO data is bundled or rehosted. WiLoR.from_pretrained() handles MANO automatically by fetching upstream WiLoR-mini assets and converting locally on your machine. The MANO hand model is separately licensed by the Max Planck Institute.

Performance

Apple M4 Max, single-image (1×256×256×3), float32:

Stable live sidecar window (embedded in Perceptasia hand tracking)

Backend	Model p50	Model p90	Model p95	Model p99
MLX (wilor-mlx)	~61 ms	~62 ms	~63 ms	~66 ms
PyTorch MPS (2.5.0)	~85 ms	~144 ms	~238 ms	~427 ms

Flat ~61ms with virtually no tail — only 8% spread from p50 to p99. MLX: 500 consecutive frames during stable operation. MPS: 102K-frame manifest history. Live numbers from Perceptasia.

Isolated model benchmark

Backend	p50	p90	min	FPS
MLX (wilor-mlx)	36 ms	36 ms	36 ms	28
PyTorch MPS (2.5.0)	50 ms	51 ms	49 ms	20

1.4x faster in pure model compute. Same deterministic input, 100 iterations after 30 warmup, batched timing.

The advantage also reproduced on a lower-bandwidth M2 Pro validation box: across 80 archived hand-positive camera frames, MLX model-call p50/p90/p95 was 252/355/418ms versus PyTorch MPS 358/490/571ms. A reversed-order audit (PyTorch MPS running first) confirmed the result.

Quantization impact on speed

Variant	p50	FPS	Notes
float32	36 ms	28	Reference
float16	36 ms	28	Equal ALU throughput on M4 Max
int4	37 ms	27	Dequant overhead ≈ bandwidth savings

On Apple Silicon, float16 and int4 do not improve latency for this model size (210 tokens × 1280 dim). The GPU is compute-overhead-bound, not bandwidth-bound. Int4's value is purely download size reduction (2.4 GB → 490 MB).

Numerical Accuracy

Compared against PyTorch WiLoR-mini on identical float32 inputs:

Variant	pred_vertices max diff	pred_keypoints_3d max diff
float32	0.006 (sub-mm)	0.006 (sub-mm)
int4	0.061 (< 1mm)	0.059 (< 1mm)

Both are within visual tolerance for real-time hand tracking.

Quick Start

from wilor_mlx import WiLoR
import mlx.core as mx

# Everything downloads and caches automatically
# First run requires torch for one-time MANO conversion; after that, torch is not used
model = WiLoR.from_pretrained()

# Inference
image = mx.array(your_256x256_hand_crop)  # (1, 256, 256, 3) uint8
result = model(image)
mx.eval(result)

keypoints = result['pred_keypoints_3d']  # (1, 21, 3)
vertices = result['pred_vertices']        # (1, 778, 3)

See github.com/lyonsno/wilor-mlx for full documentation.

Architecture

ViT-H/16 backbone: 1280 embed dim, 32 layers, 16 heads, 210 tokens (192 patches + 18 learnable)
MANO hand model: 778 vertices, 16 joints, Linear Blend Skinning with kinematic chain
RefineNet: Multi-scale deconvolution + bilinear grid sampling + MANO parameter refinement
Total parameters: ~610M

Citation

@article{zhan2024wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Zhan, Rolandos Alexandros and others},
  year={2024}
}

License

The wilor-mlx code and these weight files are MIT licensed. The weights contain only ViT backbone, RefineNet, and learned embedding parameters — no MANO data is bundled or rehosted.

The MANO hand model is separately licensed by the Max Planck Institute. WiLoR.from_pretrained() fetches upstream WiLoR-mini assets and converts MANO data locally on your machine. You can also supply your own MANO data via mano_path=....

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lyonsno/wilor-mlx

Base model

warmshao/WiLoR-mini

Finetuned

(3)

this model