File size: 2,027 Bytes
cc34926
 
fd9ac44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc34926
 
88245e4
cc34926
88245e4
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
 
cc34926
fd9ac44
 
 
 
cc34926
fd9ac44
 
cc34926
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
 
 
 
 
 
cc34926
 
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
library_name: transformers
tags:
- speech
- automatic-speech-recognition
- speech-language-model
- target-speaker-asr
- multi-talker
- speaker-diarization
- meeting-transcription
- Dixtral
- Voxtral
- DiCoW
- BUT-FIT
pipeline_tag: automatic-speech-recognition
license: apache-2.0
base_model: mistralai/Voxtral-Mini-3B-2507
datasets:
- microsoft/NOTSOFAR
- edinburghcstr/ami
---

# ๐Ÿง  Dixtral โ€” BUT-FIT Diarization-Conditioned Voxtral for Target-Speaker ASR

This repository hosts **Dixtral**, developed by [BUT Speech@FIT](https://github.com/BUTSpeechFIT). 
**Dixtral** couples the **Voxtral-Mini-3B** spoken-language model with the **DiCoW** diarization-conditioned encoder, giving the LLM target-speaker awareness in multi-talker audio.

This checkpoint is tuned for **target-speaker / multi-talker transcription (TS-ASR)** of conversational and meeting recordings. For spoken question answering, use [**Dixtral_QA**](https://huggingface.co/BUT-FIT/Dixtral_QA) instead.

## ๐Ÿ› ๏ธ Model Usage

```python
from transformers import AutoModel, AutoProcessor

MODEL_NAME = "BUT-FIT/Dixtral"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(MODEL_NAME)
```

โžก๏ธ For full inference pipelines (diarization โ†’ FDDT masks โ†’ generation), see the
[**Dixtral GitHub repository**](https://github.com/BUTSpeechFIT/Dixtral).

---

## ๐Ÿ“ฆ Model Details

* **Base Model:** [Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)
* **Encoder:** DiCoW v3 large
* **Training Datasets:**
  * [NOTSOFAR-1](https://github.com/microsoft/NOTSOFAR1-Challenge)
  * [AMI Meeting Corpus](http://groups.inf.ed.ac.uk/ami/corpus/)
  * [LibriMix / LibriSpeechMix](https://github.com/JorisCos/LibriMix)


---

## ๐Ÿ“ฌ Contact

๐Ÿ“ง **Email:** [ipoloka@fit.vut.cz](mailto:ipoloka@fit.vut.cz)
๐Ÿข **Affiliation:** [BUT Speech@FIT](https://github.com/BUTSpeechFIT), Brno University of Technology
๐Ÿ”— **GitHub:** [BUTSpeechFIT](https://github.com/BUTSpeechFIT)