OliverPerrin
/

LexiMind-Model

emotion-detection

topic-classification

encoder-decoder

Eval Results (legacy)

Model card Files Files and versions

LexiMind-Model / README.md

OliverPerrin's picture

Fix pipeline tag to summarization

0b031d1 verified 3 months ago

|

history blame contribute delete

3.35 kB

	---
	license: mit
	language:
	- en
	tags:
	- multitask
	- summarization
	- emotion-detection
	- topic-classification
	- transformer
	- flan-t5
	- encoder-decoder
	datasets:
	- OliverPerrin/LexiMind-Discovery
	- cnn_dailymail
	- booksum
	- google/emotions
	- ag_news
	pipeline_tag: summarization
	model-index:
	- name: LexiMind
	results:
	- task:
	type: summarization
	name: Summarization
	metrics:
	- type: rouge1
	value: 0.309
	- type: rougeL
	value: 0.185
	- type: bleu4
	value: 0.024
	- task:
	type: text-classification
	name: Topic Classification
	metrics:
	- type: accuracy
	value: 0.857
	- type: f1
	value: 0.854
	- task:
	type: text-classification
	name: Emotion Detection
	metrics:
	- type: f1
	value: 0.352
	---

	# LexiMind — Multi-Task Transformer Model

	LexiMind is a custom-built multi-task encoder-decoder Transformer that jointly performs abstractive summarization, emotion detection (multi-label, 28 classes), and topic classification (7 classes). It uses a FLAN-T5-base initialization with several architectural enhancements.

	## Architecture

	\| Component \| Detail \|
	\| --- \| --- \|
	\| Base \| FLAN-T5-base (272M parameters) \|
	\| Encoder \| 12 layers, 768 hidden dim, 12 heads \|
	\| Decoder \| 12 layers, 768 hidden dim, 12 heads \|
	\| FFN \| Gated-GELU, d_ff = 2048 \|
	\| Position \| Relative position bias (T5 style) \|
	\| Vocab \| 32 128 tokens (SentencePiece) \|
	\| Summarization head \| Decoder → linear projection → vocab \|
	\| Emotion head \| Attention-pooled encoder → 28-class sigmoid \|
	\| Topic head \| [CLS]-pooled encoder → 7-class softmax \|
	\| Task sampling \| Temperature-based (τ = 2.0) with proportional mixing \|

	## Training

	- Data: CNN/DailyMail + BookSum (summarization), GoEmotions (emotion), AG News (topic)
	- Epochs: 8 (~9 hours on a single NVIDIA RTX 4070)
	- Optimizer: AdamW, lr = 3e-4, weight decay = 0.01
	- Scheduler: Linear warmup (500 steps) + cosine decay
	- Gradient clipping: max norm = 1.0
	- Mixed precision: FP16 via PyTorch AMP

	## Evaluation Results

	\| Task \| Metric \| Value \|
	\| --- \| --- \| --- \|
	\| Summarization \| ROUGE-1 \| 0.309 \|
	\| Summarization \| ROUGE-L \| 0.185 \|
	\| Summarization \| BLEU-4 \| 0.024 \|
	\| Topic Classification \| Accuracy \| 85.7% \|
	\| Topic Classification \| Macro F1 \| 0.854 \|
	\| Emotion Detection \| Sample-Avg F1 \| 0.352 \|
	\| Emotion Detection \| Micro F1 \| 0.443 \|

	## Files

	\| File \| Description \|
	\| --- \| --- \|
	\| `best.pt` \| Full model checkpoint (state dict + optimizer + metadata) \|
	\| `labels.json` \| Emotion (28) and topic (7) label mappings \|
	\| `tokenizer.json` \| SentencePiece tokenizer (flat format) \|
	\| `hf_tokenizer/` \| HuggingFace-compatible tokenizer directory \|

	## Usage

	```python
	import torch
	from src.models.factory import build_model
	from src.utils.io import load_labels

	labels = load_labels("labels.json")
	model = build_model(config, labels)

	ckpt = torch.load("best.pt", map_location="cpu")
	model.load_state_dict(ckpt["model_state_dict"])
	model.eval()
	```

	See the full codebase at [github.com/OliverPerrin/LexiMind](https://github.com/OliverPerrin/LexiMind) for inference scripts, API server, and Gradio demo.

	## License

	MIT