NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 21 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information.
Trained on 500K+ samples from AI4Privacy, the model achieves F1 95.97% on validation and 2x higher F1 than the best open-source alternative (GLiNER, Presidio, SpaCy) on a 3,000-sample benchmark. It supports cross-lingual transfer to 8 European languages without additional fine-tuning.
This is the standalone NER model. For the full hybrid system with entropy-based LLM routing, see the NerGuard GitHub repository.
Supported Entities
| Category | Entity Types |
|---|---|
| Person | GIVENNAME, SURNAME, TITLE |
| Location | CITY, STREET, BUILDINGNUM, ZIPCODE |
| Government ID | IDCARDNUM, PASSPORTNUM, DRIVERLICENSENUM, SOCIALNUM, TAXNUM |
| Financial | CREDITCARDNUMBER |
| Contact | EMAIL, TELEPHONENUM |
| Temporal | DATE, TIME |
| Demographic | AGE, SEX, GENDER |
Evaluation Results
| Dataset | Accuracy | F1 (macro) | F1 (weighted) |
|---|---|---|---|
| AI4Privacy (validation) | 99.26% | 99.63% | 99.33% |
| NVIDIA Nemotron-PII | — | 46.30% | 64.47% |
Usage
from transformers import pipeline
ner = pipeline("token-classification", model="exdsgift/NerGuard-0.3B", aggregation_strategy="simple")
results = ner("My name is John Smith and my email is john@gmail.com")
for entity in results:
print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})")
Training
| Parameter | Value |
|---|---|
| Base model | microsoft/mdeberta-v3-base |
| Dataset | AI4Privacy Open PII Masking 500K |
| Max sequence length | 512 (stride 382) |
| Learning rate | 2e-5 |
| Batch size | 32 |
| Epochs | 3 |
Citation
@mastersthesis{nerguard2026,
title={Engineering a Scalable Multilingual PII Detection System with mDeBERTa-v3 and LLM-Based Validation},
author={Gabriele Durante},
school={University of Verona},
year={2026}
}
- Downloads last month
- 143
Model tree for exdsgift/NerGuard-0.3B
Dataset used to train exdsgift/NerGuard-0.3B
Evaluation results
- F1 (macro) on AI4Privacy (validation)self-reported0.996
- F1 (weighted) on AI4Privacy (validation)self-reported0.993
- Accuracy on AI4Privacy (validation)self-reported0.993
- F1 (weighted) on NVIDIA Nemotron-PIIself-reported0.645
- Precision on NVIDIA Nemotron-PIIself-reported0.602