poleval/poleval2019_cyberbullying
Updated • 218 • 3
How to use zeltera/bert-cyberbullying-bahasa-classifier with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="zeltera/bert-cyberbullying-bahasa-classifier") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("zeltera/bert-cyberbullying-bahasa-classifier")
model = AutoModelForSequenceClassification.from_pretrained("zeltera/bert-cyberbullying-bahasa-classifier")
A fine-tuned BERT multilingual classifier for detecting cyberbullying in Bahasa Indonesia. This model performs binary classification:
| Property | Value |
|---|---|
| Model Type | BERT (base multilingual) |
| Task | Cyberbullying Detection (Text Classification) |
| Language | Bahasa Indonesia |
| Labels | 0 — non-bullying, 1 — bullying |
| Framework | Hugging Face Transformers |
| Files | model.safetensors, config.json, tokenizer files |
This model was trained using a combined dataset, consisting of:
Preprocessing steps:
Dataset was balanced to reduce bias.
bert-base-multilingual-casedTraining was done on a 6GB GPU, optimized for low VRAM.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "zeltera/bert-cyberbullying-bahasa-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "anjing lu jelek banget"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
label = torch.argmax(logits, dim=1).item()
print("Prediction:", label) # 1 = bullying
| Text | Output |
|---|---|
| "mampus lu biarin aja" | 1 (bullying) |
| "kamu lagi dimana?" | 0 (non-bullying) |
| "bodoh banget sih" | 1 (bullying) |
| "nice job bro" | 0 (non-bullying) |
| Metric | Score |
|---|---|
| Accuracy | ~0.90 |
| F1 (macro) | ~0.88 |
| Precision | ~0.89 |
| Recall | ~0.87 |
config.json
model.safetensors
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.txt
README.md
MIT License
Model trained and published by @zeltera Built using Hugging Face Transformers + PyTorch. Contact instagram @gnwnadiwjy
Base model
indobenchmark/indobert-base-p1