Instructions to use WhiteRoomProdigy/amicus-ner-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhiteRoomProdigy/amicus-ner-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="WhiteRoomProdigy/amicus-ner-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("WhiteRoomProdigy/amicus-ner-v1") model = AutoModelForTokenClassification.from_pretrained("WhiteRoomProdigy/amicus-ner-v1") - Notebooks
- Google Colab
- Kaggle
Amicus NER (amicus-ner-v1)
This is a legal-domain Named Entity Recognition (NER) model built by fine-tuning legal-bert-base-uncased on court judgments and legal texts.
Model Description
The model extracts key legal entities from unstructured legal text. It is designed to assist in legal document parsing, case law summary, and legal search applications.
Extracted Entities
The model is trained to recognize the following entities:
CASE_NAME: Names of lawsuits (e.g., Okonkwo v. State)CITATION: Law report citations (e.g., [2021] LPELR-12345 (SC))STATUTE: Sections and names of laws or statutes (e.g., Section 36 of the 1999 Constitution)COURT: Judicial bodies (e.g., Supreme Court of Nigeria, Court of Appeal)DATE: Judgment or incident dates (e.g., 14th day of May, 2021)JUDGE: Judges presiding over cases (e.g., Justice Adebayo)RATIO: Specific legal principles or ratios decidendiHELD: Final holdings or decisions of the court
Training Data & Methodology
The training pipeline utilizes:
- Weak Supervision / Rules: Heuristics, regular expressions, and curated dictionaries targeting legal entities to bootstrap labeling on raw text.
- Domain Sources:
- Pre-existing Nigerian law case files (
.txt&.pdf) uploaded from Google Drive. - Nigerian legal news reports scraped directly during training.
- Pre-existing Nigerian law case files (
Hyperparameters
The model was fine-tuned using the following hyperparameters:
- Base Model:
nlpaueb/legal-bert-base-uncased - Max Sequence Length: 256 tokens
- Batch Size: 8 (Train) / 16 (Eval)
- Learning Rate: 5e-5
- Epochs: 5
- Weight Decay: 0.01
Intended Use & Limitations
- Intended Use: Assistance in highlighting case citations, statutes, court references, and judgments in West African/Nigerian legal documents.
- Limitations: Labels are generated with weak supervision. Manual validation and correction are recommended before using in critical production environments.
How to Use
You can load and query the model directly using Hugging Face's transformers library:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")
model = AutoModelForTokenClassification.from_pretrained("WhiteRoomProdigy/amicus-ner-v1")
# Define the NER pipeline
nlp = pipeline(
"ner",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
text = "In the case of Okonkwo v. State (2021) LPELR-12345, Justice Adebayo presiding at the Supreme Court of Nigeria held that the appeal succeeded."
entities = nlp(text)
for entity in entities:
print(f"{entity['entity_group']}: {entity['word']} (confidence: {entity['score']:.4f})")
- Downloads last month
- 23
Model tree for WhiteRoomProdigy/amicus-ner-v1
Base model
nlpaueb/legal-bert-base-uncased