Downloads GitHub Likes License: MIT Model Size

NerGuard-0.3B is a multilingual transformer model for Personally Identifiable Information (PII) detection, built on mDeBERTa-v3-base. It performs token-level classification across 21 PII entity types using BIO tagging, covering names, addresses, government IDs, financial data, and contact information.

Trained on 500K+ samples from AI4Privacy, the model achieves F1 95.97% on validation and 2x higher F1 than the best open-source alternative (GLiNER, Presidio, SpaCy) on a 3,000-sample benchmark. It supports cross-lingual transfer to 8 European languages without additional fine-tuning.

This is the standalone NER model. For the full hybrid system with entropy-based LLM routing, see the NerGuard GitHub repository.

Supported Entities

Category Entity Types
Person GIVENNAME, SURNAME, TITLE
Location CITY, STREET, BUILDINGNUM, ZIPCODE
Government ID IDCARDNUM, PASSPORTNUM, DRIVERLICENSENUM, SOCIALNUM, TAXNUM
Financial CREDITCARDNUMBER
Contact EMAIL, TELEPHONENUM
Temporal DATE, TIME
Demographic AGE, SEX, GENDER

Evaluation Results

Dataset Accuracy F1 (macro) F1 (weighted)
AI4Privacy (validation) 99.26% 99.63% 99.33%
NVIDIA Nemotron-PII — 46.30% 64.47%

Usage

from transformers import pipeline

ner = pipeline("token-classification", model="exdsgift/NerGuard-0.3B", aggregation_strategy="simple")
results = ner("My name is John Smith and my email is john@gmail.com")

for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} ({entity['score']:.2f})")

Training

Parameter Value
Base model microsoft/mdeberta-v3-base
Dataset AI4Privacy Open PII Masking 500K
Max sequence length 512 (stride 382)
Learning rate 2e-5
Batch size 32
Epochs 3

Citation

@mastersthesis{nerguard2026,
  title={Engineering a Scalable Multilingual PII Detection System with mDeBERTa-v3 and LLM-Based Validation},
  author={Gabriele Durante},
  school={University of Verona},
  year={2026}
}
Downloads last month
143
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for exdsgift/NerGuard-0.3B

Finetuned
(530)
this model
Quantizations
1 model

Dataset used to train exdsgift/NerGuard-0.3B

Evaluation results