emotion-tr - Turkish Emotion Classification Model
This model is designed for the classification of emotional sentiments in Turkish text.
Developed by SiriusAI Tech Brain Team
Mission
To provide advanced sentiment analysis capabilities for Turkish text, empowering businesses and researchers to understand emotional tones effectively.
The emotion-tr model leverages the BERT architecture to deliver high-performance text classification, specifically tailored for the Turkish language. By analyzing sentiments as negative, neutral, or positive, this model facilitates a deeper understanding of customer feedback, social media interactions, and other textual data, proving essential for sentiment-driven applications in various domains.
Why This Model Matters
- High Accuracy: Achieves over 97% accuracy, making it reliable for various applications.
- Robust Performance: Exhibits superior performance across all sentiment categories.
- Enterprise-Ready: Designed to meet the demands of production environments with efficient response times.
- Customizable: Can be fine-tuned for specific applications beyond emotion classification.
- Comprehensive Documentation: Provides extensive guidance for integration and usage.
Model Overview
| Property | Value |
|---|---|
| Architecture | BertForSequenceClassification |
| Base Model | dbmdz/bert-base-turkish-uncased |
| Task | Text Classification |
| Language | Turkish (tr) |
| Categories | 3 labels |
| Model Size | ~110M parameters |
| Inference Time | ~10-15ms (GPU) / ~40-50ms (CPU) |
Performance Metrics
Final Evaluation Results
| Metric | Score | Description |
|---|---|---|
| Macro F1 | 0.9744976471619214 | Harmonic mean of precision and recall |
| MCC | 0.9610214790438847 | Matthews Correlation Coefficient |
| Accuracy | 97.5557461406518% | Overall accuracy of the model |
Per-Class Performance
| Category | Accuracy | Correct | Total |
|---|---|---|---|
| negatif | 97.0% | 700 | 722 |
| notr | 98.0% | 1,069 | 1,091 |
| pozitif | 97.5% | 506 | 519 |
Dataset
Dataset Statistics
| Split | Samples | Purpose |
|---|---|---|
| Train | 9,322 | Model training |
| Test | 2,332 | Model evaluation |
| Total | 11,654 | Complete dataset |
Category Distribution
| Category | Samples | Percentage | Description |
|---|---|---|---|
| sentiment_3class | 11,654 | 100.0% | sentiment_3class category |
Subcategory Breakdown
| Category | Subcategories |
|---|---|
| sentiment_3class | pozitif, negatif, notr |
Label Definitions
| Label | ID | Description | Turkish Examples |
|---|---|---|---|
| negatif | 0 | Indicates negative sentiment | "Bu çok kötü bir film." "Hizmet berbattı." |
| notr | 1 | Indicates neutral sentiment | "Bugün hava güzel." "Toplantı yapıldı." |
| pozitif | 2 | Indicates positive sentiment | "Harika bir deneyim!" "Çok memnun kaldım." |
Important: Category Boundaries
When classifying sentiments, the distinction between notr and negatif can be subtle; for instance, "Bu film sıradan" might be interpreted as neutral, while "Bu film kötü" is clearly negative.
Training Procedure
Hyperparameters
| Parameter | Value |
|---|---|
| Base Model | dbmdz/bert-base-turkish-uncased |
| Max Sequence Length | 128 tokens |
| Batch Size | 16 |
| Learning Rate | 2e-5 |
| Epochs | 3 |
| Optimizer | AdamW |
| Weight Decay | 0.01 |
| Loss Function | CrossEntropyLoss / Focal Loss |
| Problem Type | Single-label Classification |
Training Environment
| Resource | Specification |
|---|---|
| Hardware | Apple Silicon (MPS) / CUDA GPU |
| Framework | PyTorch + Transformers |
| Training Time | Varies based on dataset size |
Usage
Installation
pip install transformers torch
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "hayatiali/emotion-tr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
LABELS = ["negatif", "notr", "pozitif"]
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)[0]
scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
primary = max(scores, key=scores.get)
return {"category": primary, "confidence": scores[primary], "all_scores": scores}
# Examples
print(predict("Bu film harika!"))
Production Class
class EmotionClassifier:
LABELS = ["negatif", "notr", "pozitif"]
def __init__(self, model_path="hayatiali/emotion-tr"):
self.tokenizer = AutoTokenizer.from_pretrained(model_path)
self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device).eval()
def predict(self, text: str) -> dict:
inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
inputs = {k: v.to(self.device) for k, v in inputs.items()}
with torch.no_grad():
logits = self.model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()
scores = dict(zip(self.LABELS, probs))
return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores}
Batch Inference
def predict_batch(texts: list, batch_size: int = 32) -> list:
results = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy()
for prob in probs:
scores = dict(zip(LABELS, prob))
results.append(scores)
return results
Limitations & Known Issues
⚠️ Model Limitations
| Limitation | Details | Impact |
|---|---|---|
| Context Sensitivity | The model may misclassify sentiments in ambiguous contexts | Potentially inaccurate predictions |
| Domain Adaptability | Performance may vary across different domains (e.g., social media vs. formal texts) | Requires further fine-tuning for specific applications |
| Language Nuances | Subtle linguistic features unique to Turkish may not be perfectly captured | May lead to classification errors in nuanced cases |
⚠️ Production Deployment Considerations
| Consideration | Details | Recommendation |
|---|---|---|
| Model Size | The model is approximately 110M parameters | Ensure adequate resources for deployment |
| Latency | Inference time may vary with input length and server load | Optimize batch sizes for improved performance |
Not Suitable For
- Legal document analysis
- Medical diagnosis based on text
- Any critical decision-making without human oversight
Ethical Considerations
Intended Use
- Sentiment analysis in customer feedback
- Emotional tone detection in social media posts
- Market research and analysis
Risks
- Bias in Data: The model may reflect biases present in the training data, leading to skewed results.
- Misinterpretation of Sentiments: Incorrect sentiment classification could misguide businesses in decision-making.
Recommendations
- Human Oversight: Always accompany model predictions with human judgment.
- Monitoring: Regularly assess model performance and retrain as necessary.
- Updates: Stay informed about updates to the model and fine-tune based on new data.
Technical Specifications
Model Architecture
BertForSequenceClassification(
(bert): BertModel(
(embeddings): BertEmbeddings
(encoder): BertEncoder (12 layers)
(pooler): BertPooler
)
(dropout): Dropout(p=0.1)
(classifier): Linear(in_features=768, out_features=3)
)
Total Parameters: ~110M
Input/Output
- Input: Turkish text (max 128 tokens)
- Output: 3-dimensional probability vector
- Tokenizer: BERTurk WordPiece (32k vocab)
Citation
@misc{emotion-tr-2025,
title={emotion-tr - Turkish Text Classification Model},
author={SiriusAI Tech Brain Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/hayatiali/emotion-tr}},
note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}
Model Card Authors
SiriusAI Tech Brain Team
Contact
- Email: info@siriusaitech.com
- Repository: GitHub
Changelog
v1.0 (Current)
- Initial release
- 3-category text classification
- Macro F1: 0.9744976471619214, MCC: 0.9610214790438847
License: SiriusAI Tech Premium License v1.0
Commercial Use: Requires Premium License. Contact: info@siriusaitech.com
Free Use Allowed For:
- Academic research and education
- Non-profit organizations (with approval)
- Evaluation (30 days)
Disclaimer: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.
- Downloads last month
- 11
Model tree for hayatiali/emotion-tr
Base model
dbmdz/bert-base-turkish-uncasedEvaluation results
- Macro F1self-reported0.974
- mccself-reported0.961