Data-Intelligent ANPR: Scalable License Plate Recognition Under Real-World Data Constraints

Abstract

This release provides Awiros-ANPR-OCR, a 37M-parameter specialist model for Automatic Number Plate Recognition (ANPR) on Indian license plates. The model is built on the PP-OCRv5 encoder-decoder backbone (SVTR_HGNet with PPHGNetV2_B4) and fine-tuned on a curated 558,767-sample corpus spanning both standard single-row and non-standard dual-row Indian plate formats.

Starting from only 6,839 publicly available labeled samples, the training corpus was grown through a data engineering pipeline combining synthetic data synthesis, consensus pseudo-labeling, distribution-aware curation, VLM-assisted data cleanup, and state-balanced batch sampling. The resulting model achieves 98.42% accuracy with sub-6ms on-device inference on an NVIDIA RTX 3090 --- a 1,260x latency advantage over frontier multimodal models like Gemini.

For the full data curation and training methodology, refer to our technical report: Technical Report.

Evaluation and Results

All systems were evaluated on a shared held-out validation set constructed using a distribution-aware split covering all Indian state codes, including both standard and non-standard plate formats.

System	Params	Overall Acc.	1-Row Acc.	2-Row Acc.	Latency Avg (ms)	Throughput (img/s)
Awiros-ANPR-OCR (Ours)	37.3M	98.42%	98.83%	96.91%	5.09	196.5
Gemini-3-flash-preview	~5-10B	93.89%	94.70%	91.20%	6,430	0.2
Gemini-2.5-flash-preview	~5B	87.23%	89.66%	78.38%	---	---
Tencent HunyuanOCR	996M	67.62%	76.65%	34.78%	309.15	3.2
PP-OCRv5 Pretrained	53.6M	57.96%	73.55%	0.24%	5.25	190.6

Latency measured on a single NVIDIA RTX 3090 GPU (batch size 1). Gemini latency is end-to-end API round-trip. PP-OCRv5 Pretrained shares the same architecture but uses original pretrained weights without domain-specific fine-tuning --- the 57.96% to 98.42% gap is entirely a data story.

Qualitative Comparison

Representative samples where Awiros-ANPR-OCR correctly transcribes the plate while all baselines produce errors. Common failure modes for baselines include confusing visually similar characters (Q→0, V→Y, M→R, B→8) and truncating dual-row plates.

Ground Truth	Awiros (Ours)	Gemini 3	Gemini 2.5	Tencent
`UP14BQ9208`	`UP14BQ9208`	`UP14B09208`	`UP14B09208`	`UP14B`
`HR35M2576`	`HR35M2576`	`HR35R2576`	`HR35R2576`	`HR35K2576`
`HR34M4007`	`HR34M4007`	`HR34H4007`	`HR34M40D7`	`HR36M4007`
`HR51BV8223`	`HR51BV8223`	`HR51BY8223`	`HR51BY8223`	`HR51BY8223`
`HR38AB2421`	`HR38AB2421`	`HR38A8242`	`HR38A82421`	`HR38A`
`HR12AX8522`	`HR12AX8522`	`HR12AX0522`	`HR12AX0522`	`HR12AX0522`
`HR46E0227`	`HR46E0227`	`HR26E0227`	`HR26E0227`	`HR6E0227`

Recurring character confusions across baselines: Q→0, M→R/K/H, V→Y, B→8, 8→0, 4→2. Tencent also truncates several dual-row and low-contrast plates.

Key Design Decisions

End-to-end architecture: Eliminates brittle multi-stage pre-processing pipelines (perspective normalization, row segmentation, per-region recognition) that prior systems relied upon
Consensus pseudo-labeling: Two independently trained models must agree on a transcription before it is accepted as a label, substantially reducing pseudo-label noise
Distribution-aware curation: Non-linear bucket-wise train/val splits ensure rare state codes are not lost to validation
State-balanced batch sampling: Uniform state-code sampling within each batch prevents training dynamics from being dominated by high-frequency states
Negative sample training: Unreadable plates labeled with an abstention token suppress hallucination on degraded inputs

Model Inference

Use the official PaddleOCR repository to run single-image inference with this release model.

Clone PaddleOCR and move into the repository root.

git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR

Install dependencies.

pip install paddlepaddle  # or paddlepaddle-gpu
pip install safetensors pillow opencv-python pyyaml

Copy test.py and en_dict.txt from this release folder into the PaddleOCR repository root.
Place model.safetensors in the PaddleOCR repository root (or specify the path via --weights).

Run inference on a single image.

python test.py \
  --image_path path/to/plate_crop.jpg \
  --weights model.safetensors \
  --device gpu

Run inference on a directory of images.

python test.py \
  --image_path path/to/plate_crops/ \
  --weights model.safetensors \
  --device gpu \
  --output_json results.json

Architecture Details

Component	Value
Framework	PaddlePaddle / PP-OCRv5
Backbone	PPHGNetV2_B4
Head	MultiHead (CTCHead + NRTRHead)
Input shape	3 x 48 x 320
Character set	0-9, A-Z, a-z, space (63 classes)
Max text length	25
Parameters	37.3M
Export format	SafeTensors (from PaddlePaddle params)

Summary

We present a practical, data-centric ANPR framework that achieves production-grade accuracy on Indian license plates without reliance on large manually annotated datasets or frontier model scale. The same PP-OCRv5 architecture scores 57.96% out-of-the-box and 98.42% after our data engineering pipeline --- demonstrating that the data, not the model, is the primary driver of performance in domain-specific OCR.

Users who want to test their own models on our validation set can do so in our Hugging Face Space. Support for submitting .bin files for testing in our internal systems will be added soon, and the link for that submission flow will be updated shortly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

37.3M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support