YuNet — LiteRT (on-device face detection, fully-GPU)

YuNet (ShiqiYu/libfacedetection), a tiny fast face detector (faces + 5 landmarks), converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. 0.076 M params / 0.3 MB fp16.

On-device (Pixel 8a, Tensor G3 — verified)


nodes on GPU	146 / 146 LITERT_CL (full residency)
inference	~4 ms (640×640)
size	0.3 MB (fp16)
accuracy	device-vs-PyTorch corr 0.9999 (all 12 outputs)

image[1,3,640,640] (BGR, 0-255) →[GPU: YuNet]→ 12 outputs: cls/obj/bbox/kps × strides {8,16,32}

How it converts (litert-torch) — clean, no re-authoring

Pure CNN (depthwise-separable ConvDPUnit) + a nearest-upsample neck (F.interpolate(mode="nearest") → RESIZE_NEAREST_NEIGHBOR, no transposed conv) + non-padded MaxPool2d (no PADV2). The head's per-stride permute/reshape/sigmoid is baked in → 12 decode-ready outputs. Banned ops NONE, ≤4D, tflite-vs-torch corr 1.0, device-vs-torch corr 0.9999.

Decode (host-side) & preprocessing

Preprocessing: letterbox to 640×640, BGR, 0-255, no normalization. Anchor-free priors (px=col·s, py=row·s, offset 0): score=cls·obj, box=center+exp(wh)·s, 5 landmarks kps·s+prior, then NMS.

License

BSD-3-Clause. Upstream: ShiqiYu/libfacedetection.

Downloads last month: -