NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 21 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information.

Trained on 500K+ samples from AI4Privacy, the model achieves F1 95.97% on validation and 2x higher F1 than the best open-source alternative (GLiNER, Presidio, SpaCy) on a 3,000-sample benchmark. It supports cross-lingual transfer to 8 European languages without additional fine-tuning.

This is the standalone NER model. For the full hybrid system with entropy-based LLM routing, see the NerGuard GitHub repository.

Supported Entities

Category	Entity Types
Person	`GIVENNAME`, `SURNAME`, `TITLE`
Location	`CITY`, `STREET`, `BUILDINGNUM`, `ZIPCODE`
Government ID	`IDCARDNUM`, `PASSPORTNUM`, `DRIVERLICENSENUM`, `SOCIALNUM`, `TAXNUM`
Financial	`CREDITCARDNUMBER`
Contact	`EMAIL`, `TELEPHONENUM`
Temporal	`DATE`, `TIME`
Demographic	`AGE`, `SEX`, `GENDER`

Evaluation Results

Dataset	Accuracy	F1 (macro)	F1 (weighted)
AI4Privacy (validation)	99.26%	99.63%	99.33%
NVIDIA Nemotron-PII	—	46.30%	64.47%

Usage

from transformers import pipeline

ner = pipeline("token-classification", model="exdsgift/NerGuard-0.3B", aggregation_strategy="simple")
results = ner("My name is John Smith and my email is john@gmail.com")

for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})")

Training

Parameter	Value
Base model	`microsoft/mdeberta-v3-base`
Dataset	AI4Privacy Open PII Masking 500K
Max sequence length	512 (stride 382)
Learning rate	2e-5
Batch size	32
Epochs	3

Citation

@mastersthesis{nerguard2026,
  title={Engineering a Scalable Multilingual PII Detection System with mDeBERTa-v3 and LLM-Based Validation},
  author={Gabriele Durante},
  school={University of Verona},
  year={2026}
}

Downloads last month: 143

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for exdsgift/NerGuard-0.3B

Base model

microsoft/deberta-v3-base

Finetuned

(530)

this model

Quantizations

1 model

Dataset used to train exdsgift/NerGuard-0.3B

Evaluation results

F1 (macro) on AI4Privacy (validation)
self-reported

0.996
F1 (weighted) on AI4Privacy (validation)
self-reported

0.993
Accuracy on AI4Privacy (validation)
self-reported

0.993
F1 (weighted) on NVIDIA Nemotron-PII
self-reported

0.645
Precision on NVIDIA Nemotron-PII
self-reported

0.602