emotion-tr - Turkish Emotion Classification Model

This model is designed for the classification of emotional sentiments in Turkish text.

Developed by SiriusAI Tech Brain Team

Mission

To provide advanced sentiment analysis capabilities for Turkish text, empowering businesses and researchers to understand emotional tones effectively.

The emotion-tr model leverages the BERT architecture to deliver high-performance text classification, specifically tailored for the Turkish language. By analyzing sentiments as negative, neutral, or positive, this model facilitates a deeper understanding of customer feedback, social media interactions, and other textual data, proving essential for sentiment-driven applications in various domains.

Why This Model Matters

High Accuracy: Achieves over 97% accuracy, making it reliable for various applications.
Robust Performance: Exhibits superior performance across all sentiment categories.
Enterprise-Ready: Designed to meet the demands of production environments with efficient response times.
Customizable: Can be fine-tuned for specific applications beyond emotion classification.
Comprehensive Documentation: Provides extensive guidance for integration and usage.

Model Overview

Property	Value
Architecture	BertForSequenceClassification
Base Model	`dbmdz/bert-base-turkish-uncased`
Task	Text Classification
Language	Turkish (tr)
Categories	3 labels
Model Size	~110M parameters
Inference Time	~10-15ms (GPU) / ~40-50ms (CPU)

Performance Metrics

Final Evaluation Results

Metric	Score	Description
Macro F1	0.9744976471619214	Harmonic mean of precision and recall
MCC	0.9610214790438847	Matthews Correlation Coefficient
Accuracy	97.5557461406518%	Overall accuracy of the model

Per-Class Performance

Category	Accuracy	Correct	Total
negatif	97.0%	700	722
notr	98.0%	1,069	1,091
pozitif	97.5%	506	519

Dataset

Dataset Statistics

Split	Samples	Purpose
Train	9,322	Model training
Test	2,332	Model evaluation
Total	11,654	Complete dataset

Category Distribution

Category	Samples	Percentage	Description
sentiment_3class	11,654	100.0%	sentiment_3class category

Subcategory Breakdown

Category	Subcategories
sentiment_3class	pozitif, negatif, notr

Label Definitions

Label	ID	Description	Turkish Examples
negatif	0	Indicates negative sentiment	"Bu çok kötü bir film." "Hizmet berbattı."
notr	1	Indicates neutral sentiment	"Bugün hava güzel." "Toplantı yapıldı."
pozitif	2	Indicates positive sentiment	"Harika bir deneyim!" "Çok memnun kaldım."

Important: Category Boundaries

When classifying sentiments, the distinction between notr and negatif can be subtle; for instance, "Bu film sıradan" might be interpreted as neutral, while "Bu film kötü" is clearly negative.

Training Procedure

Hyperparameters

Parameter	Value
Base Model	`dbmdz/bert-base-turkish-uncased`
Max Sequence Length	128 tokens
Batch Size	16
Learning Rate	2e-5
Epochs	3
Optimizer	AdamW
Weight Decay	0.01
Loss Function	CrossEntropyLoss / Focal Loss
Problem Type	Single-label Classification

Training Environment

Resource	Specification
Hardware	Apple Silicon (MPS) / CUDA GPU
Framework	PyTorch + Transformers
Training Time	Varies based on dataset size

Usage

Installation

pip install transformers torch

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "hayatiali/emotion-tr"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

LABELS = ["negatif", "notr", "pozitif"]

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)[0]

    scores = {label: float(prob) for label, prob in zip(LABELS, probs)}
    primary = max(scores, key=scores.get)
    return {"category": primary, "confidence": scores[primary], "all_scores": scores}

# Examples
print(predict("Bu film harika!"))

Production Class

class EmotionClassifier:
    LABELS = ["negatif", "notr", "pozitif"]

    def __init__(self, model_path="hayatiali/emotion-tr"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device).eval()

    def predict(self, text: str) -> dict:
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}

        with torch.no_grad():
            logits = self.model(**inputs).logits
            probs = torch.softmax(logits, dim=-1)[0].cpu().numpy()

        scores = dict(zip(self.LABELS, probs))
        return {"category": max(scores, key=scores.get), "confidence": max(scores.values()), "scores": scores}

Batch Inference

def predict_batch(texts: list, batch_size: int = 32) -> list:
    results = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        inputs = tokenizer(batch, return_tensors="pt", truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}

        with torch.no_grad():
            probs = torch.softmax(model(**inputs).logits, dim=-1).cpu().numpy()

        for prob in probs:
            scores = dict(zip(LABELS, prob))
            results.append(scores)
    return results

Limitations & Known Issues

⚠️ Model Limitations

Limitation	Details	Impact
Context Sensitivity	The model may misclassify sentiments in ambiguous contexts	Potentially inaccurate predictions
Domain Adaptability	Performance may vary across different domains (e.g., social media vs. formal texts)	Requires further fine-tuning for specific applications
Language Nuances	Subtle linguistic features unique to Turkish may not be perfectly captured	May lead to classification errors in nuanced cases

⚠️ Production Deployment Considerations

Consideration	Details	Recommendation
Model Size	The model is approximately 110M parameters	Ensure adequate resources for deployment
Latency	Inference time may vary with input length and server load	Optimize batch sizes for improved performance

Not Suitable For

Legal document analysis
Medical diagnosis based on text
Any critical decision-making without human oversight

Ethical Considerations

Intended Use

Sentiment analysis in customer feedback
Emotional tone detection in social media posts
Market research and analysis

Risks

Bias in Data: The model may reflect biases present in the training data, leading to skewed results.
Misinterpretation of Sentiments: Incorrect sentiment classification could misguide businesses in decision-making.

Recommendations

Human Oversight: Always accompany model predictions with human judgment.
Monitoring: Regularly assess model performance and retrain as necessary.
Updates: Stay informed about updates to the model and fine-tune based on new data.

Technical Specifications

Model Architecture

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings
    (encoder): BertEncoder (12 layers)
    (pooler): BertPooler
  )
  (dropout): Dropout(p=0.1)
  (classifier): Linear(in_features=768, out_features=3)
)

Total Parameters: ~110M

Input/Output

Input: Turkish text (max 128 tokens)
Output: 3-dimensional probability vector
Tokenizer: BERTurk WordPiece (32k vocab)

Citation

@misc{emotion-tr-2025,
  title={emotion-tr - Turkish Text Classification Model},
  author={SiriusAI Tech Brain Team},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/hayatiali/emotion-tr}},
  note={Fine-tuned from dbmdz/bert-base-turkish-uncased}
}

Model Card Authors

SiriusAI Tech Brain Team

Contact

Email: info@siriusaitech.com
Repository: GitHub

Changelog

v1.0 (Current)

Initial release
3-category text classification
Macro F1: 0.9744976471619214, MCC: 0.9610214790438847

License: SiriusAI Tech Premium License v1.0

Commercial Use: Requires Premium License. Contact: info@siriusaitech.com

Free Use Allowed For:

Academic research and education
Non-profit organizations (with approval)
Evaluation (30 days)

Disclaimer: This model is designed for text classification applications. Always implement with appropriate safeguards and human oversight. Model predictions should inform decisions, not replace human judgment.

Downloads last month: 11

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for hayatiali/emotion-tr

Base model

dbmdz/bert-base-turkish-uncased

Finetuned

(33)

this model

Evaluation results

Macro F1
self-reported

0.974
mcc
self-reported

0.961