LIPE V2: Landmark-guided Image Patch Embedder
LIPE V2 is an ultra-lightweight, high-efficiency gaze estimation framework designed for commodity hardware (Edge CPUs). It utilizes an Asymmetric Dual-State Pipeline to achieve state-of-the-art accuracy with minimal computational overhead.
Key Features
- Ultra-Lightweight: Only 0.61M parameters and 21.44 MFLOPs.
- High Efficiency: Achieves 30 FPS on single-threaded mobile CPUs.
- Asymmetric Architecture: Toggles between Active Appearance ($\mathcal{S}_A$) and Structural Resilience ($\mathcal{S}_B$) states.
- Robust Generalization: 4.73° MAE on MPIIGaze and 7.33° on Gaze360 (Zero-shot).
Technical Specifications
| Metric | Value |
|---|---|
| Parameters | 0.61 M |
| FLOPs | 21.44 M |
| Memory Footprint | < 45 MB RAM |
| Latency (CPU) | ~12ms (State A), ~3ms (State B) |
| Accuracy (MPII) | 4.73° MAE |
| Accuracy (Gaze360) | 7.33° (3D Angular Error) |
Model Architecture
The framework consists of a Dual-Pooling Mini-Conv embedder for image patches and a Geometric MLP for topological facial landmarks. Feature fusion is stabilized via Bimodal Distribution Resilience and Terminal Anchoring (SWA).
Usage
To use this model, clone the repository and ensure you have the dependencies installed:
pip install -r requirements.txt
Inference example:
from src.models.student import LIPEV2StudentGaze360Gold
model = LIPEV2StudentGaze360Gold()
# Load pre-trained weights
checkpoint = torch.load("checkpoints/swa_gold_p11.pt")
model.load_state_dict(checkpoint)
Citation
If you find LIPE V2 useful for your research, please cite our work.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support