Arabic Semantic / Sentiment Classification using BiLSTM

This repository contains a TensorFlow/Keras-based Bidirectional LSTM (BiLSTM) model for Arabic text classification.
The model is designed for binary classification tasks such as sentiment or semantic polarity detection.

Overview

Language: Arabic
Task: Binary text classification
Model: BiLSTM neural network
Framework: TensorFlow / Keras
Focus: Emoji-aware preprocessing and Arabic stemming

This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis.

Model Architecture

The neural network architecture consists of:

Embedding layer (vocabulary size = 10,000, embedding dim = 128)
Bidirectional LSTM (128 units, return sequences)
Dropout (0.5)
Bidirectional LSTM (64 units)
Dense layer (32 units, ReLU)
Output layer (1 unit, Sigmoid)

Loss function: Binary Crossentropy
Optimizer: Adam (lr = 0.001)

Preprocessing Pipeline

The preprocessing steps are critical and must be applied exactly as during training:

Emoji conversion using demoji
Whitespace and regex normalization
Tokenization using NLTK
Arabic stemming using ISRIStemmer
Keras tokenization and padding (max length = 100)

This pipeline allows the model to better handle:

Informal Arabic
Social media text
Emoji-heavy content

Files in This Repository

File	Description
`lstm_text_model.h5`	Trained BiLSTM model
`tokenizer.pkl`	Keras tokenizer (must match training)
`label_encoder.pkl`	Label encoder for output mapping
`requirements.txt`	Python dependencies

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support