Arabic Semantic / Sentiment Classification using BiLSTM
This repository contains a TensorFlow/Keras-based Bidirectional LSTM (BiLSTM) model for Arabic text classification.
The model is designed for binary classification tasks such as sentiment or semantic polarity detection.
Overview
- Language: Arabic
- Task: Binary text classification
- Model: BiLSTM neural network
- Framework: TensorFlow / Keras
- Focus: Emoji-aware preprocessing and Arabic stemming
This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis.
Model Architecture
The neural network architecture consists of:
- Embedding layer (vocabulary size = 10,000, embedding dim = 128)
- Bidirectional LSTM (128 units, return sequences)
- Dropout (0.5)
- Bidirectional LSTM (64 units)
- Dense layer (32 units, ReLU)
- Output layer (1 unit, Sigmoid)
Loss function: Binary Crossentropy
Optimizer: Adam (lr = 0.001)
Preprocessing Pipeline
The preprocessing steps are critical and must be applied exactly as during training:
- Emoji conversion using
demoji - Whitespace and regex normalization
- Tokenization using NLTK
- Arabic stemming using ISRIStemmer
- Keras tokenization and padding (max length = 100)
This pipeline allows the model to better handle:
- Informal Arabic
- Social media text
- Emoji-heavy content
Files in This Repository
| File | Description |
|---|---|
lstm_text_model.h5 |
Trained BiLSTM model |
tokenizer.pkl |
Keras tokenizer (must match training) |
label_encoder.pkl |
Label encoder for output mapping |
requirements.txt |
Python dependencies |
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support