Fast Neural Style Transfer — LiteRT (on-device, fully-GPU, 4 styles)

Fast neural style transfer (PyTorch examples TransformerNet, Johnson et al.), converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. Applies an artistic style to a photo — 4 styles (candy / mosaic / rain_princess / udnie), each a 3.5 MB fp16 graph.

On-device (Pixel 8a, Tensor G3 — verified)


nodes on GPU	350 / 350 LITERT_CL (full residency)
inference	~9 ms (256×256)
size	3.5 MB per style (fp16)
accuracy	device-vs-PyTorch corr 0.9998–0.9999 (all 4 styles)

image[1,3,256,256] (RGB 0-255) →[GPU: TransformerNet]→ stylized[1,3,256,256] (RGB 0-255)

How it converts (litert-torch) — three numerically-exact re-authorings

ReflectionPad2d → zero-pad (GATHER_ND → PAD; border-only difference).
Large conv activations → conv-weight scaling. The conv outputs reach ≈ |5000|, where the Mali delegate's fp16 conv accumulation loses precision → garbage (device corr 0.34 at full residency — residency ≠ correctness). Each conv is followed by an InstanceNorm (which is scale-invariant), so scaling those conv weights down so the output is ≈ |10| is exact (IN output unchanged) and keeps the fp16 accumulation precise → corr 1.0.
InstanceNorm → SafeInstanceNorm (down-scaled-domain spatial reduction, fp16-safe; SafeLayerNorm class).

Upsample is interpolate(nearest) (no transposed conv → no ZeroStuff). Result: banned ops NONE, ≤4D, tflite-vs-torch corr 1.0, device-vs-torch corr 0.9999.

Preprocessing

Center-crop to square, resize to 256×256, RGB 0–255 (no normalization), NCHW. Output is 0–255 RGB (clamp).

License

BSD-3-Clause. Upstream: pytorch/examples.

Downloads last month: -