emotion-tr - Turkish Emotion Classification Model

Hugging Face Production Ready Turkish Text Classification

This model is designed for the classification of emotional sentiments in Turkish text.

Developed by SiriusAI Tech Brain Team


Mission

To provide advanced sentiment analysis capabilities for Turkish text, empowering businesses and researchers to understand emotional tones effectively.

The emotion-tr model leverages the BERT architecture to deliver high-performance text classification, specifically tailored for the Turkish language. By analyzing sentiments as negative, neutral, or positive, this model facilitates a deeper understanding of customer feedback, social media interactions, and other textual data, proving essential for sentiment-driven applications in various domains.

Why This Model Matters

  • High Accuracy: Achieves over 97% accuracy, making it reliable for various applications.
  • Robust Performance: Exhibits superior performance across all sentiment categories.
  • Enterprise-Ready: Designed to meet the demands of production environments with efficient response times.
  • Customizable: Can be fine-tuned for specific applications beyond emotion classification.
  • Comprehensive Documentation: Provides extensive guidance for integration and usage.

Model Overview

Property Value
Architecture BertForSequenceClassification
Base Model dbmdz/bert-base-turkish-uncased
Task Text Classification
Language Turkish (tr)
Categories 3 labels
Model Size ~110M parameters
Inference Time ~10-15ms (GPU) / ~40-50ms (CPU)

Performance Metrics

Final Evaluation Results

Metric Score Description
Macro F1 0.9744976471619214 Harmonic mean of precision and recall
MCC 0.9610214790438847 Matthews Correlation Coefficient
Accuracy 97.5557461406518% Overall accuracy of the model

Per-Class Performance

Category Accuracy Correct Total
negatif 97.0% 700 722
notr 98.0% 1,069 1,091
pozitif 97.5% 506 519

Dataset

Dataset Statistics

Split Samples Purpose
Train 9,322 Model training
Test 2,332 Model evaluation
Total 11,654 Complete dataset

Category Distribution

Category Samples Percentage Description
sentiment_3class 11,654 100.0% sentiment_3class category

Subcategory Breakdown

Category Subcategories
sentiment_3class pozitif, negatif, notr

Label Definitions

Label ID Description Turkish Examples
negatif 0 Indicates negative sentiment "Bu çok kötü bir film." "Hizmet berbattı."
notr 1 Indicates neutral sentiment "Bugün hava güzel." "Toplantı yapıldı."
pozitif 2 Indicates positive sentiment "Harika bir deneyim!" "Çok memnun kaldım."

Important: Category Boundaries

When classifying sentiments, the distinction between notr and negatif can be subtle; for instance, "Bu film sıradan" might be interpreted as neutral, while "Bu film kötü" is clearly negative.


Training Procedure

Hyperparameters

Parameter Value
Base Model dbmdz/bert-base-turkish-uncased
Max Sequence Length 128 tokens
Batch Size 16
Learning Rate 2e-5
Epochs 3
Optimizer AdamW
Weight Decay 0.01
Loss Function CrossEntropyLoss / Focal Loss
Problem Type Single-label Classification

Training Environment

Resource Specification
Hardware Apple Silicon (MPS) / CUDA GPU
Framework PyTorch + Transformers
Training Time Varies based on dataset size

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "hayatiali/emotion-tr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

LABELS = ["negatif", "notr", "pozitif"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    primary = max(scores, key=scores.get)
    return {"category": primary, "confidence": scores[primary], "all_scores": scores}

# Examples
print(predict("Bu film harika!"))

Production Class

class EmotionClassifier:
    LABELS = ["negatif", "notr", "pozitif"]

    def __init__(self, model_path="hayatiali/emotion-tr"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device).eval()

    def predict(self, text: str) -> dict:
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            logits = self.model(**inputs).logits
            probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()

        scores = dict(zip(self.LABELS, probs))
        return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores}

Batch Inference

def predict_batch(texts: list, batch_size: int = 32) -> list:
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy()

        for prob in probs:
            scores = dict(zip(LABELS, prob))
            results.append(scores)
    return results

Limitations & Known Issues

⚠️ Model Limitations

Limitation Details Impact
Context Sensitivity The model may misclassify sentiments in ambiguous contexts Potentially inaccurate predictions
Domain Adaptability Performance may vary across different domains (e.g., social media vs. formal texts) Requires further fine-tuning for specific applications
Language Nuances Subtle linguistic features unique to Turkish may not be perfectly captured May lead to classification errors in nuanced cases

⚠️ Production Deployment Considerations

Consideration Details Recommendation
Model Size The model is approximately 110M parameters Ensure adequate resources for deployment
Latency Inference time may vary with input length and server load Optimize batch sizes for improved performance

Not Suitable For

  • Legal document analysis
  • Medical diagnosis based on text
  • Any critical decision-making without human oversight

Ethical Considerations

Intended Use

  • Sentiment analysis in customer feedback
  • Emotional tone detection in social media posts
  • Market research and analysis

Risks

  • Bias in Data: The model may reflect biases present in the training data, leading to skewed results.
  • Misinterpretation of Sentiments: Incorrect sentiment classification could misguide businesses in decision-making.

Recommendations

  1. Human Oversight: Always accompany model predictions with human judgment.
  2. Monitoring: Regularly assess model performance and retrain as necessary.
  3. Updates: Stay informed about updates to the model and fine-tune based on new data.

Technical Specifications

Model Architecture

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings
    (encoder): BertEncoder (12 layers)
    (pooler): BertPooler
  )
  (dropout): Dropout(p=0.1)
  (classifier): Linear(in_features=768, out_features=3)
)

Total Parameters: ~110M

Input/Output

  • Input: Turkish text (max 128 tokens)
  • Output: 3-dimensional probability vector
  • Tokenizer: BERTurk WordPiece (32k vocab)

Citation

@misc{emotion-tr-2025,
  title={emotion-tr - Turkish Text Classification Model},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/emotion-tr}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}

Model Card Authors

SiriusAI Tech Brain Team

Contact


Changelog

v1.0 (Current)

  • Initial release
  • 3-category text classification
  • Macro F1: 0.9744976471619214, MCC: 0.9610214790438847

License: SiriusAI Tech Premium License v1.0

Commercial Use: Requires Premium License. Contact: info@siriusaitech.com

Free Use Allowed For:

  • Academic research and education
  • Non-profit organizations (with approval)
  • Evaluation (30 days)

Disclaimer: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.

Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hayatiali/emotion-tr

Finetuned
(33)
this model

Evaluation results