xlm-mbti

This model is a fine-tuned version of xlm-roberta-base for MBTI (16 types) personality classification. It is specifically optimized to analyze lyrical structures and emotional prose, particularly within the context of Midwest Emo and Math Rock lyrics.

The model was trained on a balanced version of the anggars/mbti-emotion dataset, where each class combination was capped to ensure fair distribution and reduce bias towards majority classes (e.g., ESTP/ESFP).

Model description

  • Model Type: Multilingual RoBERTa
  • Language(s): English, Indonesian
  • License: MIT
  • Finetuned from model: xlm-roberta-base
  • Task: Multi-class Text Classification (16 MBTI Labels)

Intended uses & limitations

This model is intended for academic research in the field of Natural Language Processing (NLP) and psychology. It is designed to predict MBTI personality types based on lyrical patterns. Limitations: Personality is complex; the model provides predictions based on linguistic patterns in specific musical subgenres and should not be used as a definitive psychological diagnostic tool.

Training and evaluation data

The dataset used is anggars/mbti-emotion, which has been pre-processed into a lyrical format (using line breaks) and undersampled to a maximum of 500 samples per class combination to mitigate stereotyping bias.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.7663 1.0 5618 1.7531 0.4009
1.5260 2.0 11236 1.6300 0.4436
1.3024 3.0 16854 1.6108 0.4630

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
31
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for anggars/xlm-mbti

Finetuned
(3838)
this model

Dataset used to train anggars/xlm-mbti

Spaces using anggars/xlm-mbti 2

Evaluation results