WiLoR-MLX: Hand Pose Estimation on Apple Silicon

MLX port of WiLoR-mini for native Apple Silicon inference. Complete pipeline: ViT-H/16 backbone + MANO hand model + RefineNet refinement.

Code: github.com/lyonsno/wilor-mlx

Available Weights

Variant File Size Precision Notes
float32 wilor-mlx.safetensors 2.4 GB Full Reference quality, recommended
int4 wilor-mlx-int4.safetensors 490 MB 4-bit quantized 5x smaller download, same speed

Both variants produce near-identical inference speed on Apple Silicon (see benchmarks below). Choose based on download size and precision needs.

These weights contain only ViT backbone, RefineNet, and learned embedding parameters โ€” no MANO data is bundled or rehosted. WiLoR.from_pretrained() handles MANO automatically by fetching upstream WiLoR-mini assets and converting locally on your machine. The MANO hand model is separately licensed by the Max Planck Institute.

Performance

Apple M4 Max, single-image (1ร—256ร—256ร—3), float32:

Stable live sidecar window (embedded in Perceptasia hand tracking)

Backend Model p50 Model p90 Model p95 Model p99
MLX (wilor-mlx) ~61 ms ~62 ms ~63 ms ~66 ms
PyTorch MPS (2.5.0) ~85 ms ~144 ms ~238 ms ~427 ms

Flat ~61ms with virtually no tail โ€” only 8% spread from p50 to p99. MLX: 500 consecutive frames during stable operation. MPS: 102K-frame manifest history. Live numbers from Perceptasia.

Isolated model benchmark

Backend p50 p90 min FPS
MLX (wilor-mlx) 36 ms 36 ms 36 ms 28
PyTorch MPS (2.5.0) 50 ms 51 ms 49 ms 20

1.4x faster in pure model compute. Same deterministic input, 100 iterations after 30 warmup, batched timing.

The advantage also reproduced on a lower-bandwidth M2 Pro validation box: across 80 archived hand-positive camera frames, MLX model-call p50/p90/p95 was 252/355/418ms versus PyTorch MPS 358/490/571ms. A reversed-order audit (PyTorch MPS running first) confirmed the result.

Quantization impact on speed

Variant p50 FPS Notes
float32 36 ms 28 Reference
float16 36 ms 28 Equal ALU throughput on M4 Max
int4 37 ms 27 Dequant overhead โ‰ˆ bandwidth savings

On Apple Silicon, float16 and int4 do not improve latency for this model size (210 tokens ร— 1280 dim). The GPU is compute-overhead-bound, not bandwidth-bound. Int4's value is purely download size reduction (2.4 GB โ†’ 490 MB).

Numerical Accuracy

Compared against PyTorch WiLoR-mini on identical float32 inputs:

Variant pred_vertices max diff pred_keypoints_3d max diff
float32 0.006 (sub-mm) 0.006 (sub-mm)
int4 0.061 (< 1mm) 0.059 (< 1mm)

Both are within visual tolerance for real-time hand tracking.

Quick Start

from wilor_mlx import WiLoR
import mlx.core as mx

# Everything downloads and caches automatically
# First run requires torch for one-time MANO conversion; after that, torch is not used
model = WiLoR.from_pretrained()

# Inference
image = mx.array(your_256x256_hand_crop)  # (1, 256, 256, 3) uint8
result = model(image)
mx.eval(result)

keypoints = result['pred_keypoints_3d']  # (1, 21, 3)
vertices = result['pred_vertices']        # (1, 778, 3)

See github.com/lyonsno/wilor-mlx for full documentation.

Architecture

  • ViT-H/16 backbone: 1280 embed dim, 32 layers, 16 heads, 210 tokens (192 patches + 18 learnable)
  • MANO hand model: 778 vertices, 16 joints, Linear Blend Skinning with kinematic chain
  • RefineNet: Multi-scale deconvolution + bilinear grid sampling + MANO parameter refinement
  • Total parameters: ~610M

Citation

@article{zhan2024wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Zhan, Rolandos Alexandros and others},
  year={2024}
}

License

The wilor-mlx code and these weight files are MIT licensed. The weights contain only ViT backbone, RefineNet, and learned embedding parameters โ€” no MANO data is bundled or rehosted.

The MANO hand model is separately licensed by the Max Planck Institute. WiLoR.from_pretrained() fetches upstream WiLoR-mini assets and converts MANO data locally on your machine. You can also supply your own MANO data via mano_path=....

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lyonsno/wilor-mlx

Finetuned
(3)
this model