Text Classification
TF-Keras
Italian
custom-multitask
bert
alberto
multi-task-learning
italian
gender-classification
ideology-detection
Instructions to use leeeov4/PIDIT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TF-Keras
How to use leeeov4/PIDIT with TF-Keras:
# Note: 'keras<3.x' or 'tf_keras' must be installed (legacy) # See https://github.com/keras-team/tf-keras for more details. from huggingface_hub import from_pretrained_keras model = from_pretrained_keras("leeeov4/PIDIT") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - tf-keras | |
| - bert | |
| - alberto | |
| - multi-task-learning | |
| - text-classification | |
| - italian | |
| - gender-classification | |
| - ideology-detection | |
| library_name: tf-keras | |
| language: | |
| - it | |
| datasets: | |
| - custom | |
| # PIDIT: Political Ideology Detection in Italian Texts | |
| A Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹 | |
| This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts. | |
| It is designed to predict: | |
| - **Author gender** (binary classification) | |
| - **Binary ideology** (e.g., progressive vs conservative) | |
| - **Multiclass ideology** (4 ideological classes) | |
| ## ✨ Architecture | |
| - `TFBertModel` from `bert-base-italian-uncased` (frozen) | |
| - `TFAutoModel` from `alberto-base-uncased` (frozen) | |
| - Concatenated outputs + dense layers | |
| - Three output heads: | |
| - `gender`: `Dense(1, activation="sigmoid")` | |
| - `ideology_binary`: `Dense(1, activation="sigmoid")` | |
| - `ideology_multiclass`: `Dense(4, activation="softmax")` | |
| ## 📥 Input | |
| The model takes **6 input tensors**: | |
| - `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask` | |
| - `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask` | |
| All tensors have shape `(batch_size, max_length)`. | |
| --- | |
| ## 🚀 Usage | |
| ### Load model and tokenizers | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| from transformers import TFBertModel, TFAutoModel | |
| import tensorflow as tf | |
| # Download the model locally | |
| model_path = snapshot_download("leeeov4/PIDIT") | |
| # Load the model | |
| model = tf.keras.models.load_model(model_path, custom_objects={ | |
| "TFBertModel": TFBertModel, | |
| "TFAutoModel": TFAutoModel | |
| }) | |
| # Load the tokenizers | |
| from transformers import AutoTokenizer | |
| bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer") | |
| alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer") | |
| ``` | |
| ### Preprocessing Example | |
| ```python | |
| def preprocess_text(text, max_length=250): | |
| bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf') | |
| alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf') | |
| return { | |
| 'bert_input_ids': bert_tokens['input_ids'], | |
| 'bert_token_type_ids': bert_tokens['token_type_ids'], | |
| 'bert_attention_mask': bert_tokens['attention_mask'], | |
| 'alberto_input_ids': alberto_tokens['input_ids'], | |
| 'alberto_token_type_ids': alberto_tokens['token_type_ids'], | |
| 'alberto_attention_mask': alberto_tokens['attention_mask'] | |
| } | |
| ``` | |
| ### Inference | |
| ```python | |
| text = "Oggi, sabato 31 dicembre, alle ore 9.34, nel Monastero Mater Ecclesiae in Vaticano, il Signore ha chiamato a Sé il Santo Padre Emerito Benedetto XVI." | |
| inputs = preprocess_text(text) | |
| outputs = model.predict(inputs) | |
| gender_prob = outputs[0][0][0] | |
| ideology_binary_prob = outputs[1][0][0] | |
| ideology_multiclass_probs = outputs[2][0] | |
| print("Predicted gender (male probability):", gender_prob) | |
| print("Predicted binary ideology (left probability):", ideology_binary_prob) | |
| print("Multiclass ideology distribution (left, right, moderate left, moderate right):", ideology_multiclass_probs) | |
| ``` |