Arabic Semantic / Sentiment Classification using BiLSTM

This repository contains a TensorFlow/Keras-based Bidirectional LSTM (BiLSTM) model for Arabic text classification.
The model is designed for binary classification tasks such as sentiment or semantic polarity detection.

Overview

  • Language: Arabic
  • Task: Binary text classification
  • Model: BiLSTM neural network
  • Framework: TensorFlow / Keras
  • Focus: Emoji-aware preprocessing and Arabic stemming

This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis.


Model Architecture

The neural network architecture consists of:

  • Embedding layer (vocabulary size = 10,000, embedding dim = 128)
  • Bidirectional LSTM (128 units, return sequences)
  • Dropout (0.5)
  • Bidirectional LSTM (64 units)
  • Dense layer (32 units, ReLU)
  • Output layer (1 unit, Sigmoid)

Loss function: Binary Crossentropy
Optimizer: Adam (lr = 0.001)


Preprocessing Pipeline

The preprocessing steps are critical and must be applied exactly as during training:

  1. Emoji conversion using demoji
  2. Whitespace and regex normalization
  3. Tokenization using NLTK
  4. Arabic stemming using ISRIStemmer
  5. Keras tokenization and padding (max length = 100)

This pipeline allows the model to better handle:

  • Informal Arabic
  • Social media text
  • Emoji-heavy content

Files in This Repository

File Description
lstm_text_model.h5 Trained BiLSTM model
tokenizer.pkl Keras tokenizer (must match training)
label_encoder.pkl Label encoder for output mapping
requirements.txt Python dependencies
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support