darwinkernelpanic
/

moderat

Text Classification

content-moderation

Model card Files Files and versions

moderat / README.md

darwinkernelpanic's picture

darwinkernelpanic

Upload README.md with huggingface_hub

85c76fb verified 3 months ago

|

history blame contribute delete

3 kB

	---
	language: en
	license: mit
	library_name: sklearn
	tags:
	- content-moderation
	- text-classification
	- safety
	- dual-mode
	- pii-detection
	- child-safety
	---

	# moderat - Dual-Mode Content Moderation + PII Filter

	A text classification model for content moderation with age-appropriate filtering and PII detection.

	## Features

	- Dual-mode filtering: <13 (strict) vs 13+ (laxed)
	- 6 content categories: Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam
	- PII Detection: Emails, phones, addresses, credit cards, SSN
	- Unicode Deobfuscation: Detects circled letters (ⓕ), double-struck (ℂ), fullwidth, mathematical symbols
	- Social Media Protection:
	- <13: Block all social media sharing
	- 13+: Allow, block only if grooming detected
	- Grooming Detection: Keywords like "dm me", "don't tell parents", "our secret"

	## Quick Start

	```python
	from pii_extension import CombinedModerationFilter

	filter = CombinedModerationFilter("darwinkernelpanic/moderat")

	# Content moderation
	result = filter.check("damn that's crazy", age=15)
	# -> ALLOWED (reaction swearing for 13+)

	# PII blocking (all ages)
	result = filter.check("My email is test@gmail.com", age=15)
	# -> BLOCKED (PII detected)

	# Social media (13+ allowed)
	result = filter.check("Follow me on instagram @user", age=15)
	# -> ALLOWED

	# Grooming detection
	result = filter.check("DM me privately, don't tell parents", age=14)
	# -> BLOCKED (grooming detected)
	```

	## Unicode Deobfuscation

	Automatically detects and normalizes unicode bypass attempts:

	\| Technique \| Example \| Normalized \|
	\|-----------\|---------\|------------\|
	\| Circled letters \| `ⓕⓤⓒⓚ` \| `fuck` \|
	\| Double-struck \| `ℂℍ` \| `CH` \|
	\| Fullwidth \| `Ｆ` \| `F` \|
	\| Mathematical \| `𝐟` \| `f` \|

	All obfuscated text is normalized before moderation checks.

	## Social Media Rules

	\| Age \| Plain Share \| With Grooming Context \|
	\|-----\|-------------\|----------------------\|
	\| <13 \| ❌ Blocked \| ❌ Blocked \|
	\| 13+ \| ✅ Allowed \| ❌ Blocked \|

	Grooming keywords: "dm me", "don't tell", "secret", "send pics", "meet up", etc.

	## Content Labels

	\| Text \| <13 \| 13+ \|
	\|------\|-----\|-----\|
	\| "damn that's crazy" \| ❌ Blocked \| ✅ Allowed \|
	\| "shit that sucks" \| ❌ Blocked \| ✅ Allowed \|
	\| "you're trash" \| ❌ Blocked \| ❌ Blocked \|
	\| "kill yourself" \| ❌ Blocked \| ❌ Blocked \|

	## Model Details

	- Algorithm: Multinomial Naive Bayes + TF-IDF + Regex PII
	- Content accuracy: 77%
	- PII detection: Regex-based (fast, no ML)
	- Features: 10,000 max, 1-3 ngrams

	## Files

	- `moderation_model.pkl` - Content moderation model
	- `pii_extension.py` - PII + grooming detection
	- `inference.py` - Basic inference
	- `moderat_speed_test.ipynb` - Colab notebook

	## Colab

	Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb)

	## Speed

	- Single inference: ~2-5ms
	- With PII check: ~3-7ms
	- Throughput: ~300-500 texts/sec