DINOHash β€” extra checkpoints

Additional DINOHash perceptual-hashing models not present in backslashh/DINOHash. Each model is provided both as the raw training/traced artifact (raw/) and as an exported ONNX graph (repo root, dynamic batch axis, opset 17).

Model ONNX Raw Notes
ViT-Small β†’ ViT-Tiny (DINO distill) ViT-Small-ViT-Tiny.onnx raw/ViT-Small-ViT-Tiny.pth student backbone (vit_tiny_patch16_224), 192-d embedding
XCiT-Small β†’ XCiT-Tiny (DINO distill) XCiT-Small-XCiT-Tiny.onnx raw/XCiT-Small-XCiT-Tiny.pth student backbone (xcit_tiny_12_p16_224), 192-d embedding
MAE-Lite mae_tiny_400e mae_tiny_400e_traced.onnx raw/mae_tiny_400e_traced.pt 192-d
MAE-Lite mae_tiny_distill_400e mae_tiny_distill_400e_traced.onnx raw/mae_tiny_distill_400e_traced.pt 192-d
MAE-Lite mae_tiny_distill_d2_400e mae_tiny_distill_d2_400e_traced.onnx raw/mae_tiny_distill_d2_400e_traced.pt 192-d
MAE-Lite mocov3_tiny_400e mocov3_tiny_400e_traced.onnx raw/mocov3_tiny_400e_traced.pt 192-d

Notes on the raw files

  • MAE-Lite raw files are TorchScript (_traced.pt), self-contained and loadable directly.
  • ViT / XCiT raw files are full DINO training checkpoints (student/teacher/optimizer/...). The ONNX graphs were built by extracting the student.backbone.* weights into the matching timm architecture (strict-clean load) and exporting; XCiT required pos_embederβ†’pos_embed rename and qkv split/fuse between class-attention and XCA blocks.

All inputs are (batch, 3, 224, 224).

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support