| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - multitask |
| - summarization |
| - emotion-detection |
| - topic-classification |
| - transformer |
| - flan-t5 |
| - encoder-decoder |
| datasets: |
| - OliverPerrin/LexiMind-Discovery |
| - cnn_dailymail |
| - booksum |
| - google/emotions |
| - ag_news |
| pipeline_tag: summarization |
| model-index: |
| - name: LexiMind |
| results: |
| - task: |
| type: summarization |
| name: Summarization |
| metrics: |
| - type: rouge1 |
| value: 0.309 |
| - type: rougeL |
| value: 0.185 |
| - type: bleu4 |
| value: 0.024 |
| - task: |
| type: text-classification |
| name: Topic Classification |
| metrics: |
| - type: accuracy |
| value: 0.857 |
| - type: f1 |
| value: 0.854 |
| - task: |
| type: text-classification |
| name: Emotion Detection |
| metrics: |
| - type: f1 |
| value: 0.352 |
| --- |
| |
| # LexiMind β Multi-Task Transformer Model |
|
|
| LexiMind is a custom-built multi-task encoder-decoder Transformer that jointly performs **abstractive summarization**, **emotion detection** (multi-label, 28 classes), and **topic classification** (7 classes). It uses a FLAN-T5-base initialization with several architectural enhancements. |
|
|
| ## Architecture |
|
|
| | Component | Detail | |
| | --- | --- | |
| | Base | FLAN-T5-base (272M parameters) | |
| | Encoder | 12 layers, 768 hidden dim, 12 heads | |
| | Decoder | 12 layers, 768 hidden dim, 12 heads | |
| | FFN | Gated-GELU, d_ff = 2048 | |
| | Position | Relative position bias (T5 style) | |
| | Vocab | 32 128 tokens (SentencePiece) | |
| | Summarization head | Decoder β linear projection β vocab | |
| | Emotion head | Attention-pooled encoder β 28-class sigmoid | |
| | Topic head | [CLS]-pooled encoder β 7-class softmax | |
| | Task sampling | Temperature-based (Ο = 2.0) with proportional mixing | |
| |
| ## Training |
| |
| - **Data**: CNN/DailyMail + BookSum (summarization), GoEmotions (emotion), AG News (topic) |
| - **Epochs**: 8 (~9 hours on a single NVIDIA RTX 4070) |
| - **Optimizer**: AdamW, lr = 3e-4, weight decay = 0.01 |
| - **Scheduler**: Linear warmup (500 steps) + cosine decay |
| - **Gradient clipping**: max norm = 1.0 |
| - **Mixed precision**: FP16 via PyTorch AMP |
| |
| ## Evaluation Results |
| |
| | Task | Metric | Value | |
| | --- | --- | --- | |
| | Summarization | ROUGE-1 | 0.309 | |
| | Summarization | ROUGE-L | 0.185 | |
| | Summarization | BLEU-4 | 0.024 | |
| | Topic Classification | Accuracy | 85.7% | |
| | Topic Classification | Macro F1 | 0.854 | |
| | Emotion Detection | Sample-Avg F1 | 0.352 | |
| | Emotion Detection | Micro F1 | 0.443 | |
| |
| ## Files |
| |
| | File | Description | |
| | --- | --- | |
| | `best.pt` | Full model checkpoint (state dict + optimizer + metadata) | |
| | `labels.json` | Emotion (28) and topic (7) label mappings | |
| | `tokenizer.json` | SentencePiece tokenizer (flat format) | |
| | `hf_tokenizer/` | HuggingFace-compatible tokenizer directory | |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from src.models.factory import build_model |
| from src.utils.io import load_labels |
| |
| labels = load_labels("labels.json") |
| model = build_model(config, labels) |
| |
| ckpt = torch.load("best.pt", map_location="cpu") |
| model.load_state_dict(ckpt["model_state_dict"]) |
| model.eval() |
| ``` |
|
|
| See the full codebase at [github.com/OliverPerrin/LexiMind](https://github.com/OliverPerrin/LexiMind) for inference scripts, API server, and Gradio demo. |
|
|
| ## License |
|
|
| MIT |
|
|