πŸ›‘οΈ Context-Aware Threat & Compliance Detection in Conversational Text

NLP Course Project β€” Individual Assignment
Model: Fine-tuned roberta-base | Task: 6-class Multi-label Classification
Dataset: google/jigsaw_toxicity_pred (~150K samples)


🌐 Live Demo

Resource URL
πŸ–₯️ Interactive UI (HuggingFace Spaces) https://huggingface.co/spaces/Pommu/threat-detection-jigsaw
πŸ€– Fine-tuned Model https://huggingface.co/Pommu/threat-detection-jigsaw
πŸ”Œ Local API http://localhost:8000/predict (POST)

πŸ“Œ Problem Statement

Modern organizations face increasing risks from implicit threats and workplace harassment. Unlike simple slur-detectors, this project focuses on intent-based detection. It uses the Jigsaw Toxic Comment dataset to identify professional and personal risk across 6 binary categories:

Label Description Risk Level
Threatening Violent intent, intimidation, physical danger πŸ”΄ HIGH
Hate Speech Identity-based attacks, protected groups πŸ”΄ HIGH
Highly Severe Extreme toxicity, highly disruptive πŸ”΄ HIGH
Toxic Rude, disrespectful, unprofessional 🟑 MEDIUM
Insult Personal attacks, non-violent harassment 🟑 MEDIUM
Profanity Obscene language, compliance violation 🟑 LOW

πŸ—οΈ Architecture: The 2-Layer Safety Net

[User Input: Text / Chat Transcript]
         β”‚
         β–Ό
   [Gradio UI] ── HTTP POST /predict ──▢ [FastAPI Backend]
                                                β”‚
                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                   β”‚                         β”‚
                             [Layer 1]                 [Layer 2]
                       Fine-tuned RoBERTa          Lexical Booster
                       (6 Jigsaw labels)       (keyword pattern match)
                                   β”‚                         β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                β”‚
                                  {label, confidence, risk_level}
  1. Layer 1 β€” Weighted RoBERTa: Multi-label model fine-tuned with 50x class weights on the rare threat class (0.3% of data) to maximize recall.
  2. Layer 2 β€” Lexical Booster: Backend safety layer that catches ominous linguistic patterns (e.g., "watch your back", "you'll regret") unconditionally, regardless of model score.

πŸ“Š Evaluation Results

Design Choice: We prioritize Recall over Precision for threats. A false positive (flagging safe text) is far less costly than a false negative (missing a real threat).


πŸš€ API Usage

# Test implicit threat detection locally
curl -X POST http://localhost:8000/predict \
     -H "Content-Type: application/json" \
     -d '{"text": "I will find you and you will regret every decision you have made."}'

JSON Response:

{
  "prediction": "Threatening",
  "confidence": 0.85,
  "is_threat": true,
  "risk_level": "HIGH",
  "all_scores": [
    {"label": "Toxic", "confidence": 0.0009},
    {"label": "Threatening", "confidence": 0.85},
    ...
  ]
}

πŸ“‚ Project Structure

NLP Course Project/
β”œβ”€β”€ notebook/
β”‚   └── train_3.ipynb         ← Colab training notebook (WeightedTrainer)
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py               ← FastAPI backend + Lexical Booster
β”‚   β”œβ”€β”€ model_loader.py       ← Singleton model loader (thread-safe)
β”‚   β”œβ”€β”€ schemas.py            ← Pydantic request/response models
β”‚   └── demo.py               ← Gradio UI (standalone OR API-backed)
β”œβ”€β”€ spaces/
β”‚   β”œβ”€β”€ app.py                ← HuggingFace Spaces entry point
β”‚   └── requirements.txt
β”œβ”€β”€ requirements.txt          ← API/demo dependencies
β”œβ”€β”€ requirements_training.txt ← Training dependencies (Colab)
β”œβ”€β”€ setup_local.bat           ← Windows one-click environment setup
└── README.md

βš™οΈ Local Setup (Windows)

# Option A: one-click setup
setup_local.bat

# Option B: manual
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt

Set your .env file:

HF_TOKEN=your_token_here
MODEL_NAME=Pommu/threat-detection-jigsaw

πŸƒ Running Locally

# Terminal 1 β€” Start the FastAPI backend
uvicorn app.main:app --reload

# Terminal 2 β€” Launch the Gradio demo
python app/demo.py

Open http://localhost:7860 to see the UI.


☁️ Training on Colab

  1. Open notebook/train_3.ipynb in Google Colab
  2. Enable GPU accelerator (Tesla T4)
  3. Add HF_Token to Colab Secrets (πŸ”‘ icon)
  4. Run all cells β€” model auto-pushes to HuggingFace Hub

🌐 Deploy to HuggingFace Spaces

  1. Go to huggingface.co/new-space β†’ SDK: Gradio
  2. In Space Settings β†’ Variables: MODEL_NAME = Pommu/threat-detection-jigsaw
  3. Push spaces/ contents to the Space repo

πŸ“š Technologies Used

Component Technology
Model roberta-base (HuggingFace Transformers)
Training HuggingFace Trainer API + WeightedTrainer (custom)
Dataset google/jigsaw_toxicity_pred (150K samples, 6 labels)
Backend API FastAPI + Uvicorn
UI Gradio
Deployment HuggingFace Spaces
Evaluation scikit-learn (Precision/Recall/F1/Confusion Matrix)
Downloads last month
80
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train Pommu/threat-detection-jigsaw

Space using Pommu/threat-detection-jigsaw 1

Evaluation results

  • Weighted F1 on Jigsaw Toxic Comment Classification
    self-reported
    0.790