Amicus NER (amicus-ner-v1)

This is a legal-domain Named Entity Recognition (NER) model built by fine-tuning legal-bert-base-uncased on court judgments and legal texts.

Model Description

The model extracts key legal entities from unstructured legal text. It is designed to assist in legal document parsing, case law summary, and legal search applications.

Extracted Entities

The model is trained to recognize the following entities:

  • CASE_NAME: Names of lawsuits (e.g., Okonkwo v. State)
  • CITATION: Law report citations (e.g., [2021] LPELR-12345 (SC))
  • STATUTE: Sections and names of laws or statutes (e.g., Section 36 of the 1999 Constitution)
  • COURT: Judicial bodies (e.g., Supreme Court of Nigeria, Court of Appeal)
  • DATE: Judgment or incident dates (e.g., 14th day of May, 2021)
  • JUDGE: Judges presiding over cases (e.g., Justice Adebayo)
  • RATIO: Specific legal principles or ratios decidendi
  • HELD: Final holdings or decisions of the court

Training Data & Methodology

The training pipeline utilizes:

  1. Weak Supervision / Rules: Heuristics, regular expressions, and curated dictionaries targeting legal entities to bootstrap labeling on raw text.
  2. Domain Sources:
    • Pre-existing Nigerian law case files (.txt & .pdf) uploaded from Google Drive.
    • Nigerian legal news reports scraped directly during training.

Hyperparameters

The model was fine-tuned using the following hyperparameters:

  • Base Model: nlpaueb/legal-bert-base-uncased
  • Max Sequence Length: 256 tokens
  • Batch Size: 8 (Train) / 16 (Eval)
  • Learning Rate: 5e-5
  • Epochs: 5
  • Weight Decay: 0.01

Intended Use & Limitations

  • Intended Use: Assistance in highlighting case citations, statutes, court references, and judgments in West African/Nigerian legal documents.
  • Limitations: Labels are generated with weak supervision. Manual validation and correction are recommended before using in critical production environments.

How to Use

You can load and query the model directly using Hugging Face's transformers library:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")
model = AutoModelForTokenClassification.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")

# Define the NER pipeline
nlp = pipeline(
    "ner",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

text = "In the case of Okonkwo v. State (2021) LPELR-12345, Justice Adebayo presiding at the Supreme Court of Nigeria held that the appeal succeeded."
entities = nlp(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")
Downloads last month
23
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WhiteRoomProdigy/amicus-ner-v1

Quantized
(3)
this model