File size: 3,353 Bytes
db92e2d 0b031d1 db92e2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | ---
license: mit
language:
- en
tags:
- multitask
- summarization
- emotion-detection
- topic-classification
- transformer
- flan-t5
- encoder-decoder
datasets:
- OliverPerrin/LexiMind-Discovery
- cnn_dailymail
- booksum
- google/emotions
- ag_news
pipeline_tag: summarization
model-index:
- name: LexiMind
results:
- task:
type: summarization
name: Summarization
metrics:
- type: rouge1
value: 0.309
- type: rougeL
value: 0.185
- type: bleu4
value: 0.024
- task:
type: text-classification
name: Topic Classification
metrics:
- type: accuracy
value: 0.857
- type: f1
value: 0.854
- task:
type: text-classification
name: Emotion Detection
metrics:
- type: f1
value: 0.352
---
# LexiMind — Multi-Task Transformer Model
LexiMind is a custom-built multi-task encoder-decoder Transformer that jointly performs **abstractive summarization**, **emotion detection** (multi-label, 28 classes), and **topic classification** (7 classes). It uses a FLAN-T5-base initialization with several architectural enhancements.
## Architecture
| Component | Detail |
| --- | --- |
| Base | FLAN-T5-base (272M parameters) |
| Encoder | 12 layers, 768 hidden dim, 12 heads |
| Decoder | 12 layers, 768 hidden dim, 12 heads |
| FFN | Gated-GELU, d_ff = 2048 |
| Position | Relative position bias (T5 style) |
| Vocab | 32 128 tokens (SentencePiece) |
| Summarization head | Decoder → linear projection → vocab |
| Emotion head | Attention-pooled encoder → 28-class sigmoid |
| Topic head | [CLS]-pooled encoder → 7-class softmax |
| Task sampling | Temperature-based (τ = 2.0) with proportional mixing |
## Training
- **Data**: CNN/DailyMail + BookSum (summarization), GoEmotions (emotion), AG News (topic)
- **Epochs**: 8 (~9 hours on a single NVIDIA RTX 4070)
- **Optimizer**: AdamW, lr = 3e-4, weight decay = 0.01
- **Scheduler**: Linear warmup (500 steps) + cosine decay
- **Gradient clipping**: max norm = 1.0
- **Mixed precision**: FP16 via PyTorch AMP
## Evaluation Results
| Task | Metric | Value |
| --- | --- | --- |
| Summarization | ROUGE-1 | 0.309 |
| Summarization | ROUGE-L | 0.185 |
| Summarization | BLEU-4 | 0.024 |
| Topic Classification | Accuracy | 85.7% |
| Topic Classification | Macro F1 | 0.854 |
| Emotion Detection | Sample-Avg F1 | 0.352 |
| Emotion Detection | Micro F1 | 0.443 |
## Files
| File | Description |
| --- | --- |
| `best.pt` | Full model checkpoint (state dict + optimizer + metadata) |
| `labels.json` | Emotion (28) and topic (7) label mappings |
| `tokenizer.json` | SentencePiece tokenizer (flat format) |
| `hf_tokenizer/` | HuggingFace-compatible tokenizer directory |
## Usage
```python
import torch
from src.models.factory import build_model
from src.utils.io import load_labels
labels = load_labels("labels.json")
model = build_model(config, labels)
ckpt = torch.load("best.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state_dict"])
model.eval()
```
See the full codebase at [github.com/OliverPerrin/LexiMind](https://github.com/OliverPerrin/LexiMind) for inference scripts, API server, and Gradio demo.
## License
MIT
|