File size: 2,027 Bytes

cc34926
 
fd9ac44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cc34926
 
88245e4
cc34926
88245e4
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
 
cc34926
fd9ac44
 
 
 
cc34926
fd9ac44
 
cc34926
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44
 
 
 
 
 
cc34926
 
fd9ac44
cc34926
fd9ac44
cc34926
fd9ac44

---
library_name: transformers
tags:
- speech
- automatic-speech-recognition
- speech-language-model
- target-speaker-asr
- multi-talker
- speaker-diarization
- meeting-transcription
- Dixtral
- Voxtral
- DiCoW
- BUT-FIT
pipeline_tag: automatic-speech-recognition
license: apache-2.0
base_model: mistralai/Voxtral-Mini-3B-2507
datasets:
- microsoft/NOTSOFAR
- edinburghcstr/ami
---

# 🧠 Dixtral — BUT-FIT Diarization-Conditioned Voxtral for Target-Speaker ASR

This repository hosts **Dixtral**, developed by [BUT Speech@FIT](https://github.com/BUTSpeechFIT). 
**Dixtral** couples the **Voxtral-Mini-3B** spoken-language model with the **DiCoW** diarization-conditioned encoder, giving the LLM target-speaker awareness in multi-talker audio.

This checkpoint is tuned for **target-speaker / multi-talker transcription (TS-ASR)** of conversational and meeting recordings. For spoken question answering, use [**Dixtral_QA**](https://huggingface.co/BUT-FIT/Dixtral_QA) instead.

## 🛠️ Model Usage

```python
from transformers import AutoModel, AutoProcessor

MODEL_NAME = "BUT-FIT/Dixtral"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(MODEL_NAME)
```

➡️ For full inference pipelines (diarization → FDDT masks → generation), see the
[**Dixtral GitHub repository**](https://github.com/BUTSpeechFIT/Dixtral).

---

## 📦 Model Details

* **Base Model:** [Voxtral-Mini-3B-2507](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)
* **Encoder:** DiCoW v3 large
* **Training Datasets:**
  * [NOTSOFAR-1](https://github.com/microsoft/NOTSOFAR1-Challenge)
  * [AMI Meeting Corpus](http://groups.inf.ed.ac.uk/ami/corpus/)
  * [LibriMix / LibriSpeechMix](https://github.com/JorisCos/LibriMix)


---

## 📬 Contact

📧 **Email:** [ipoloka@fit.vut.cz](mailto:ipoloka@fit.vut.cz)
🏢 **Affiliation:** [BUT Speech@FIT](https://github.com/BUTSpeechFIT), Brno University of Technology
🔗 **GitHub:** [BUTSpeechFIT](https://github.com/BUTSpeechFIT)