ai4privacy/pii-masking-200k
Viewer β’ Updated β’ 209k β’ 2.99k β’ 121
How to use Mercy-62/bart-base-pii-masker-lora with PEFT:
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM
base_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
model = PeftModel.from_pretrained(base_model, "Mercy-62/bart-base-pii-masker-lora")A LoRA fine-tuned BART model for PII (Personally Identifiable Information) masking transforms sensitive text into privacy-safe form while keeping sentence meaning intact.
Mercy-62/bart-base-pii-masker-lora is built on top of facebook/bart-base and fine-tuned using the PEFT LoRA approach for lightweight adaptation.
It is designed to automatically detect and mask personally identifiable information such as:
Input:
"Customer John Smith applied for a credit card on May 10, 2023."
Output:
Customer [NAME] applied for a credit card on [DATE].
| Component | Value |
|---|---|
| Base Model | facebook/bart-base |
| Adapter Type | LoRA (Low-Rank Adaptation) |
| Frameworks | PyTorch, Transformers, PEFT |
| Task | Text-to-Text Generation |
| Language | English |
from peft import PeftModel
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load model
base_model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
model = PeftModel.from_pretrained(base_model, "Mercy-62/bart-base-pii-masker-lora")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-base")
def test_pii_masking(model, tokenizer, samples):
device = model.device
for text in samples:
inputs = tokenizer(text, return_tensors="pt", truncation=True).to(device)
outputs = model.generate(**inputs, max_length=80)
print(f"\nInput: {text}")
print("Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
pii_samples = [
"Customer John Smith applied for a credit card on May 10, 2023.",
"Contact number: +92-300-1234567, Email: sara.khan@gmail.com",
]
test_pii_masking(model, tokenizer, pii_samples)