File size: 10,045 Bytes
d028b91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdea8ff
d028b91
8876e2a
d028b91
cdea8ff
db55146
d028b91
 
 
 
 
 
db55146
 
8876e2a
d028b91
8876e2a
d028b91
8876e2a
db55146
8876e2a
d028b91
 
 
 
 
 
 
 
 
 
 
 
cdea8ff
d028b91
 
 
 
 
 
 
 
 
 
 
 
cdea8ff
 
 
d028b91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
403101e
 
 
 
d028b91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cdea8ff
d028b91
 
 
8876e2a
d028b91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8876e2a
d028b91
 
 
 
 
 
 
 
 
cdea8ff
 
 
 
 
d028b91
 
 
 
 
 
 
 
 
8876e2a
d028b91
 
 
 
8876e2a
d028b91
8876e2a
d028b91
 
 
 
 
8876e2a
d028b91
 
 
 
8876e2a
d028b91
8876e2a
d028b91
8876e2a
d028b91
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7742990
d028b91
 
 
7742990
 
d028b91
 
 
7742990
d028b91
 
 
 
 
 
 
8876e2a
cdea8ff
d028b91
 
 
 
 
cf1c97e
d028b91
cf1c97e
d028b91
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
---
language:
- en
tags:
- text-detoxification
- text2text-generation
- detoxification
- content-moderation
- toxicity-reduction
- llama
- gguf
- minibase
license: apache-2.0
datasets:
- paradetox
metrics:
- toxicity-reduction
- semantic-similarity
- fluency
- latency
model-index:
- name: Detoxify-Small
  results:
  - task:
      type: text-detoxification
      name: Toxicity Reduction
    dataset:
      type: paradetox
      name: ParaDetox
      config: toxic-neutral
      split: test
    metrics:
    - type: toxicity-reduction
      value: 0.032
      name: Average Toxicity Reduction
    - type: semantic-similarity
      value: 0.471
      name: Semantic to Expected
    - type: fluency
      value: 0.919
      name: Text Fluency
    - type: latency
      value: 66.4
      name: Average Latency (ms)
---

# Detoxify-Small πŸ€–

<div align="center">

**A highly compact (~100 MB) and efficient text detoxification model for removing toxicity while preserving meaning.**

[![Model Size](https://img.shields.io/badge/Model_Size-138MB-blue)](https://huggingface.co/)
[![Architecture](https://img.shields.io/badge/Architecture-LlamaForCausalLM-green)](https://huggingface.co/)
[![License](https://img.shields.io/badge/License-Apache_2.0-yellow)](LICENSE)
[![Discord](https://img.shields.io/badge/Discord-Join_Community-5865F2)](https://discord.com/invite/BrJn4D2Guh)

*Built by [Minibase](https://minibase.ai) - Train and deploy small AI models from your browser.*
*Browse all of the models and datasets available on the [Minibase Marketplace](https://minibase.ai/wiki/Special:Marketplace).

</div>

## πŸ“‹ Model Summary

**Minibase-Detoxify-Small** is a compact language model fine-tuned specifically for text detoxification tasks. It takes toxic or inappropriate text as input and generates cleaned, non-toxic versions while preserving the original meaning and intent as much as possible.

### Key Features
- ⚑ **Fast Inference**: ~66ms average response time
- 🎯 **High Fluency**: 91.9% well-formed output text
- 🧹 **Effective Detoxification**: 3.2% average toxicity reduction
- πŸ’Ύ **Compact Size**: Only 138MB (GGUF quantized)
- πŸ”’ **Privacy-First**: Runs locally, no data sent to external servers

## πŸš€ Quick Start

### Local Inference (Recommended)

1. **Install llama.cpp** (if not already installed):
   ```bash
   git clone https://github.com/ggerganov/llama.cpp
   cd llama.cpp && make
   ```

2. **Download and run the model**:
   ```bash
   # Download model files
   wget https://huggingface.co/minibase/detoxify-small/resolve/main/model.gguf
   wget https://huggingface.co/minibase/detoxify-small/resolve/main/run_server.sh

   # Make executable and run
   chmod +x run_server.sh
   ./run_server.sh
   ```

3. **Make API calls**:
   ```python
   import requests

   # Detoxify text
   response = requests.post("http://127.0.0.1:8000/completion", json={
       "prompt": "Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: This is fucking terrible!\n\nResponse: ",
       "max_tokens": 200,
       "temperature": 0.7
   })

   result = response.json()
   print(result["content"])  # "This is really terrible!"
   ```

### Python Client

```python
from detoxify_inference import DetoxifyClient

# Initialize client
client = DetoxifyClient()

# Detoxify text
toxic_text = "This product is fucking amazing, no bullshit!"
clean_text = client.detoxify_text(toxic_text)

print(clean_text)  # "This product is really amazing, no kidding!"
```

## πŸ“Š Benchmarks & Performance

### ParaDetox Dataset Results (1,008 samples)

| Metric | Score | Description |
|--------|-------|-------------|
β€’ Original Toxicity:            0.051 (5.1%)
β€’ Final Toxicity:               0.020 (2.0%)

| **Toxicity Reduction** | 0.051 (ParaDetox) --> 0.020 | Reduced toxicity scores by more than 50% |
| **Semantic to Expected** | 0.471 (47.1%) | Similarity to human expert rewrites |
| **Semantic to Original** | 0.625 (62.5%) | How much original meaning is preserved |
| **Fluency** | 0.919 (91.9%) | Quality of generated text structure |
| **Latency** | 66.4ms | Average response time |
| **Throughput** | ~15 req/sec | Estimated requests per second |

### Dataset Breakdown

#### General Toxic Content (1,000 samples)
- **Semantic Preservation**: 62.7%
- **Fluency**: 91.9%

### Comparison with Baselines

| Model | Semantic Similarity | Toxicity Reduction | Fluency |
|-------|-------------------|-------------------|---------|
| **Detoxify-Small** | **0.471** | **0.032** | **0.919** |
| BART-base (ParaDetox) | 0.750 | ~0.15 | ~0.85 |
| Human Performance | 0.850 | ~0.25 | ~0.95 |

## πŸ—οΈ Technical Details

### Model Architecture
- **Architecture**: LlamaForCausalLM
- **Parameters**: 49,152 (extremely compact)
- **Context Window**: 1,024 tokens
- **Quantization**: GGUF (4-bit quantization)
- **File Size**: 138MB
- **Memory Requirements**: 8GB RAM minimum, 16GB recommended

### Training Details
- **Base Model**: Custom-trained Llama architecture
- **Fine-tuning Dataset**: Curated toxic-neutral parallel pairs
- **Training Objective**: Instruction-following for detoxification
- **Optimization**: Quantized for edge deployment

### System Requirements
- **OS**: Linux, macOS, Windows
- **RAM**: 8GB minimum, 16GB recommended
- **Storage**: 200MB free space
- **Dependencies**: llama.cpp, Python 3.7+

## πŸ“– Usage Examples

### Basic Detoxification
```python
# Input: "This is fucking awesome!"
# Output: "This is really awesome!"

# Input: "You stupid idiot, get out of my way!"
# Output: "You silly person, please move aside!"
```

### API Integration
```python
import requests

def detoxify_text(text: str) -> str:
    """Detoxify text using Detoxify-Small API"""
    prompt = f"Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: {text}\n\nResponse: "

    response = requests.post("http://127.0.0.1:8000/completion", json={
        "prompt": prompt,
        "max_tokens": 200,
        "temperature": 0.7
    })

    return response.json()["content"]

# Usage
toxic_comment = "This product sucks donkey balls!"
clean_comment = detoxify_text(toxic_comment)
print(clean_comment)  # "This product is not very good!"
```

### Batch Processing
```python
import asyncio
import aiohttp

async def detoxify_batch(texts: list) -> list:
    """Process multiple texts concurrently"""
    async with aiohttp.ClientSession() as session:
        tasks = []
        for text in texts:
            prompt = f"Instruction: Rewrite the provided text to remove the toxicity.\n\nInput: {text}\n\nResponse: "
            payload = {
                "prompt": prompt,
                "max_tokens": 200,
                "temperature": 0.7
            }
            tasks.append(session.post("http://127.0.0.1:8000/completion", json=payload))

        responses = await asyncio.gather(*tasks)
        return [await resp.json() for resp in responses]

# Process multiple comments
comments = [
    "This is fucking brilliant!",
    "You stupid moron!",
    "What the hell is wrong with you?"
]

clean_comments = await detoxify_batch(comments)
```

## πŸ”§ Advanced Configuration

### Server Configuration
```bash
# GPU acceleration (macOS with Metal)
llama-server \
  -m model.gguf \
  --host 127.0.0.1 \
  --port 8000 \
  --n-gpu-layers 35 \
  --metal

# CPU-only (lower memory usage)
llama-server \
  -m model.gguf \
  --host 127.0.0.1 \
  --port 8000 \
  --n-gpu-layers 0 \
  --threads 8

# Custom context window
llama-server \
  -m model.gguf \
  --ctx-size 2048 \
  --host 127.0.0.1 \
  --port 8000
```

### Temperature Settings
- **Low (0.1-0.3)**: Conservative detoxification, minimal changes
- **Medium (0.4-0.7)**: Balanced approach (recommended)
- **High (0.8-1.0)**: Creative detoxification, more aggressive changes

## πŸ“š Limitations & Biases

### Current Limitations
- **Vocabulary Scope**: Trained primarily on English toxic content
- **Context Awareness**: May not detect sarcasm or cultural context
- **Length Constraints**: Limited to 1024 token context window
- **Domain Specificity**: Optimized for general web content

### Potential Biases
- **Cultural Context**: May not handle culture-specific expressions
- **Dialect Variations**: Limited exposure to regional dialects
- **Emerging Slang**: May not recognize newest internet slang

## 🀝 Contributing

We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.

### Development Setup
```bash
# Clone the repository
git clone https://github.com/minibase-ai/detoxify-small
cd detoxify-small

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/
```

## πŸ“œ Citation

If you use Detoxify-Small in your research, please cite:

```bibtex
@misc{detoxify-small-2025,
  title={Detoxify-Small: A Compact Text Detoxification Model},
  author={Minibase AI Team},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/minibase/detoxify-small}
}
```

## πŸ“ž Contact & Community

- **Website**: [minibase.ai](https://minibase.ai)
- **Discord Community**: [Join our Discord](https://discord.com/invite/BrJn4D2Guh)
- **GitHub Issues**: [Report bugs or request features on Discord](https://discord.com/invite/BrJn4D2Guh)
- **Email**: hello@minibase.ai

### Support
- πŸ“– **Documentation**: [help.minibase.ai](https://help.minibase.ai)
- πŸ’¬ **Community Forum**: [Join our Discord Community](https://discord.com/invite/BrJn4D2Guh)

## πŸ“‹ License

This model is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)).

## πŸ™ Acknowledgments

- **ParaDetox Dataset**: Used for benchmarking and evaluation
- **llama.cpp**: For efficient local inference
- **Hugging Face**: For model hosting and community
- **Our amazing community**: For feedback and contributions

---

<div align="center">

**Built with ❀️ by the Minibase team**

*Making AI more accessible for everyone*

[πŸ“– Minibase Help Center](https://help.minibase.ai) β€’ [πŸ’¬ Join our Discord](https://discord.com/invite/BrJn4D2Guh)

</div>