Amicus NER (amicus-ner-v1)

This is a legal-domain Named Entity Recognition (NER) model built by fine-tuning legal-bert-base-uncased on court judgments and legal texts.

Model Description

The model extracts key legal entities from unstructured legal text. It is designed to assist in legal document parsing, case law summary, and legal search applications.

Extracted Entities

The model is trained to recognize the following entities:

CASE_NAME: Names of lawsuits (e.g., Okonkwo v. State)
CITATION: Law report citations (e.g., [2021] LPELR-12345 (SC))
STATUTE: Sections and names of laws or statutes (e.g., Section 36 of the 1999 Constitution)
COURT: Judicial bodies (e.g., Supreme Court of Nigeria, Court of Appeal)
DATE: Judgment or incident dates (e.g., 14th day of May, 2021)
JUDGE: Judges presiding over cases (e.g., Justice Adebayo)
RATIO: Specific legal principles or ratios decidendi
HELD: Final holdings or decisions of the court

Training Data & Methodology

The training pipeline utilizes:

Weak Supervision / Rules: Heuristics, regular expressions, and curated dictionaries targeting legal entities to bootstrap labeling on raw text.
Domain Sources:
- Pre-existing Nigerian law case files (.txt & .pdf) uploaded from Google Drive.
- Nigerian legal news reports scraped directly during training.

Hyperparameters

The model was fine-tuned using the following hyperparameters:

Base Model: nlpaueb/legal-bert-base-uncased
Max Sequence Length: 256 tokens
Batch Size: 8 (Train) / 16 (Eval)
Learning Rate: 5e-5
Epochs: 5
Weight Decay: 0.01

Intended Use & Limitations

Intended Use: Assistance in highlighting case citations, statutes, court references, and judgments in West African/Nigerian legal documents.
Limitations: Labels are generated with weak supervision. Manual validation and correction are recommended before using in critical production environments.

How to Use

You can load and query the model directly using Hugging Face's transformers library:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")
model = AutoModelForTokenClassification.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")

# Define the NER pipeline
nlp = pipeline(
    "ner",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

text = "In the case of Okonkwo v. State (2021) LPELR-12345, Justice Adebayo presiding at the Supreme Court of Nigeria held that the appeal succeeded."
entities = nlp(text)

for entity in entities:
    print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")

Downloads last month: 23

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for WhiteRoomProdigy/amicus-ner-v1

Base model

nlpaueb/legal-bert-base-uncased

Quantized

(3)

this model