VehicleDINO

Unified multi-task vehicle recognition model โ€” detection, type classification, make/model identification, re-identification, and license plate OCR in a single forward pass.

Architecture

  • Backbone: DINOv2 ViT-B/14 (frozen, with LoRA adapters)
  • Neck: SimpleFPN (768 -> 256) + HybridEncoder (AIFI + CCFM)
  • Decoder: RT-DETR-style with 300 detection queries + 1 global attribute query
  • Heads: 6 task-specific heads (det, type, make, model, Re-ID, OCR)

Model Variants

File Format Size Notes
vehicledino_dinov2.onnx FP32 450 MB Full precision
vehicledino_dinov2_int8.onnx INT8 139 MB Quantized, 3.2x smaller

Input / Output

Input: images โ€” float32 tensor (1, 3, 560, 560), ImageNet-normalized RGB

Outputs:

Tensor Shape Description
det_boxes (1, 300, 4) Detection boxes (cx, cy, w, h normalized)
det_classes (1, 300, 5) Detection class logits (car, suv, truck, bus, van)
vehicle_types (1, 1, 8) Vehicle type logits
makes (1, 1, 42) Make classification logits
models (1, 1, 323) Model classification logits
reid_embeds (1, 1, 256) L2-normalized Re-ID embedding
ocr_logits (1, 1, 8, 37) License plate OCR logits (8 positions, 37 chars)

Performance (Test Set)

Task Metric Score
Type Classification Top-1 Accuracy 95.6%
Make Classification Top-1 Accuracy 98.4%
Model Classification Top-1 Accuracy 87.7%
Re-ID (VeRi-776) mAP 61.1%
Re-ID (VeRi-776) Rank-1 86.1%

Training Data

  • Detection + Type + Re-ID: VeRi-776 (776 vehicles, 49,360 images)
  • Make/Model: CompCars (42 makes, 323 models)
  • OCR: CCPD-Green (Chinese license plates)

Usage with ONNX Runtime (Python)

import onnxruntime as ort
import numpy as np
from PIL import Image

session = ort.InferenceSession("vehicledino_dinov2_int8.onnx")

# Preprocess: resize to 560x560, ImageNet normalize
img = Image.open("car.jpg").resize((560, 560))
arr = np.array(img).astype(np.float32) / 255.0
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
arr = (arr - mean) / std
tensor = arr.transpose(2, 0, 1)[np.newaxis]  # (1, 3, 560, 560)

outputs = session.run(None, {"images": tensor.astype(np.float32)})

Usage in Browser (ONNX Runtime Web)

The INT8 model runs in the browser via ONNX Runtime Web with WebGPU or WASM backend.

Live demo: https://yolov11-plate-recognition.swmengappdev.workers.dev

Citation

@article{vehicledino2026,
  title={VehicleDINO: Unified Multi-Task Vehicle Recognition via DINOv2 Features},
  author={Soh, Wei Meng},
  year={2026}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support