What Is This?
HybridEmotionNet β a dual-branch neural network for real-time facial emotion recognition that fuses EfficientNet-B0 appearance features with MediaPipe 3D landmark geometry via bidirectional cross-attention.
Processes webcam frames at 30+ FPS, extracts 478 3D landmarks, crops the face, and classifies into 7 emotions with temporal smoothing.
Architecture
Face crop (224Γ224) βββΊ EfficientNet-B0 βββΊ [B, 256] appearance
478 landmarks (xyz) βββΊ MLP encoder βββΊ [B, 256] geometry
β
Bidirectional Cross-Attention (4 heads each)
ββββββββββββββββββββββββββββββββββββββββββββ
β coord β CNN (geometry queries appear.) β
β CNN β coord (appear. queries geometry) β
ββββββββββββββββββββββββββββββββββββββββββββ
β
Fusion MLP: 512 β 384 β 256 β 128
β
Classifier: 128 β 7 emotions
| Component | Detail |
|---|---|
| CNN branch | EfficientNet-B0, ImageNet init, blocks 0β2 frozen |
| Coord branch | MLP 1434 β 512 β 384 β 256, BN + Dropout |
| Fusion | Bidirectional cross-attention + MLP |
| Parameters | 6.2M total / 5.75M trainable |
| Model size | 72 MB |
Files in This Repo
| File | Size | Required |
|---|---|---|
models/weights/hybrid_best_model.pth |
72 MB | Yes β model weights |
models/scalers/hybrid_coordinate_scaler.pkl |
18 KB | Yes β landmark scaler |
Architecture digram.png |
β | No β docs only |
Quick Start
1 β Clone the code
git clone https://github.com/Huuffy/VisageCNN.git
cd VisageCNN
python -m venv venv && venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements.txt
2 β Download weights
from huggingface_hub import hf_hub_download
import shutil, pathlib
for remote, local in [
("models/weights/hybrid_best_model.pth", "models/weights/hybrid_best_model.pth"),
("models/scalers/hybrid_coordinate_scaler.pkl", "models/scalers/hybrid_coordinate_scaler.pkl"),
]:
src = hf_hub_download(repo_id="Huuffy/VisageCNN", filename=remote)
pathlib.Path(local).parent.mkdir(parents=True, exist_ok=True)
shutil.copy(src, local)
Or with the HF CLI:
hf download Huuffy/VisageCNN models/weights/hybrid_best_model.pth --local-dir .
hf download Huuffy/VisageCNN models/scalers/hybrid_coordinate_scaler.pkl --local-dir .
3 β Run inference
python inference/run_hybrid.py
Press Q to quit.
Emotion Classes
| Label | Emotion | Key Signals |
|---|---|---|
| 0 | Angry | Furrowed brows, tightened jaw |
| 1 | Disgust | Raised upper lip, wrinkled nose |
| 2 | Fear | Wide eyes, raised brows, open mouth |
| 3 | Happy | Raised cheeks, open smile |
| 4 | Neutral | Relaxed, no strong deformation |
| 5 | Sad | Lowered brow corners, downturned lips |
| 6 | Surprised | Raised brows, wide eyes, dropped jaw |
Training Dataset
~30k clean images β FER2013 noise removed across all classes:
| Class | Images | Sources |
|---|---|---|
| Angry | 6,130 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
| Surprised | 5,212 | RAF-DB + AffectNet |
| Sad | 4,941 | RAF-DB + AffectNet + AffectNet-Short + CK+ |
| Disgust | 3,782 | AffectNet-Short + RAF-DB + CK+ |
| Neutral | 3,475 | RAF-DB + AffectNet |
| Fear | 3,418 | AffectNet-Short + RAF-DB + CK+ |
| Happy | 3,124 | RAF-DB + AffectNet |
Max class imbalance: 1.97Γ
Training Config
| Setting | Value |
|---|---|
| Loss | Focal Loss Ξ³=2.0 + label smoothing 0.12 |
| Optimizer | AdamW, weight decay 0.05 |
| LR | OneCycleLR β CNN 5e-5, fusion 5e-4 |
| Batch | 128 + grad accumulation Γ2 (eff. 256) |
| Augmentation | CutMix + noise + rotation + zoom |
| Mixed precision | torch.amp (AMP) |
| Early stopping | patience=40 on val accuracy |
Retrain From Scratch
# Build dataset (downloads ~30k clean images from HuggingFace)
pip install datasets
python scripts/prepare_dataset.py
# Delete old cache and train
rmdir /s /q models\cache
python scripts/train_hybrid.py
Full training guide: GitHub README
