| --- |
| language: en |
| license: mit |
| library_name: sklearn |
| tags: |
| - content-moderation |
| - text-classification |
| - safety |
| - dual-mode |
| - pii-detection |
| - child-safety |
| --- |
| |
| # moderat - Dual-Mode Content Moderation + PII Filter |
|
|
| A text classification model for content moderation with age-appropriate filtering and PII detection. |
|
|
| ## Features |
|
|
| - **Dual-mode filtering:** <13 (strict) vs 13+ (laxed) |
| - **6 content categories:** Safe, Harassment, Swearing (reaction), Swearing (aggressive), Hate Speech, Spam |
| - **PII Detection:** Emails, phones, addresses, credit cards, SSN |
| - **Unicode Deobfuscation:** Detects circled letters (β), double-struck (β), fullwidth, mathematical symbols |
| - **Social Media Protection:** |
| - <13: Block all social media sharing |
| - 13+: Allow, block only if grooming detected |
| - **Grooming Detection:** Keywords like "dm me", "don't tell parents", "our secret" |
|
|
| ## Quick Start |
|
|
| ```python |
| from pii_extension import CombinedModerationFilter |
| |
| filter = CombinedModerationFilter("darwinkernelpanic/moderat") |
| |
| # Content moderation |
| result = filter.check("damn that's crazy", age=15) |
| # -> ALLOWED (reaction swearing for 13+) |
| |
| # PII blocking (all ages) |
| result = filter.check("My email is test@gmail.com", age=15) |
| # -> BLOCKED (PII detected) |
| |
| # Social media (13+ allowed) |
| result = filter.check("Follow me on instagram @user", age=15) |
| # -> ALLOWED |
| |
| # Grooming detection |
| result = filter.check("DM me privately, don't tell parents", age=14) |
| # -> BLOCKED (grooming detected) |
| ``` |
|
|
| ## Unicode Deobfuscation |
|
|
| Automatically detects and normalizes unicode bypass attempts: |
|
|
| | Technique | Example | Normalized | |
| |-----------|---------|------------| |
| | Circled letters | `ββ€ββ` | `fuck` | |
| | Double-struck | `ββ` | `CH` | |
| | Fullwidth | `οΌ¦` | `F` | |
| | Mathematical | `π` | `f` | |
|
|
| **All obfuscated text is normalized before moderation checks.** |
|
|
| ## Social Media Rules |
|
|
| | Age | Plain Share | With Grooming Context | |
| |-----|-------------|----------------------| |
| | <13 | β Blocked | β Blocked | |
| | 13+ | β
Allowed | β Blocked | |
|
|
| **Grooming keywords:** "dm me", "don't tell", "secret", "send pics", "meet up", etc. |
|
|
| ## Content Labels |
|
|
| | Text | <13 | 13+ | |
| |------|-----|-----| |
| | "damn that's crazy" | β Blocked | β
Allowed | |
| | "shit that sucks" | β Blocked | β
Allowed | |
| | "you're trash" | β Blocked | β Blocked | |
| | "kill yourself" | β Blocked | β Blocked | |
|
|
| ## Model Details |
|
|
| - **Algorithm:** Multinomial Naive Bayes + TF-IDF + Regex PII |
| - **Content accuracy:** 77% |
| - **PII detection:** Regex-based (fast, no ML) |
| - **Features:** 10,000 max, 1-3 ngrams |
|
|
| ## Files |
|
|
| - `moderation_model.pkl` - Content moderation model |
| - `pii_extension.py` - PII + grooming detection |
| - `inference.py` - Basic inference |
| - `moderat_speed_test.ipynb` - Colab notebook |
|
|
| ## Colab |
|
|
| Test it: [Open in Colab](https://colab.research.google.com/github/darwinkernelpanic/moderat/blob/main/moderat_speed_test.ipynb) |
|
|
| ## Speed |
|
|
| - Single inference: ~2-5ms |
| - With PII check: ~3-7ms |
| - Throughput: ~300-500 texts/sec |
|
|