|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
language: |
|
|
- ca |
|
|
base_model: |
|
|
- projecte-aina/stt_ca-es_conformer_transducer_large |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- NeMo |
|
|
model-index: |
|
|
- name: stt_ca-es_conformer_transducer_large-rapnic-down |
|
|
results: |
|
|
- task: |
|
|
name: Automatic Speech Recognition |
|
|
type: automatic-speech-recognition |
|
|
dataset: |
|
|
name: Rapnic (Test) |
|
|
type: CLiC-UB/rapnic-example |
|
|
split: test |
|
|
args: |
|
|
language: ca |
|
|
metrics: |
|
|
- name: WER |
|
|
type: wer |
|
|
value: 30.78 |
|
|
--- |
|
|
# FFT for Down Syndrome: NVIDIA Conformer-Transducer Large (ca-es) |
|
|
|
|
|
## Table of Contents |
|
|
<details> |
|
|
<summary>Click to expand</summary>**** |
|
|
|
|
|
- [FFT for Down Syndrome: NVIDIA Conformer-Transducer Large (ca-es)](#fft-for-down-syndrome-nvidia-conformer-transducer-large-ca-es) |
|
|
- [Table of Contents](#table-of-contents) |
|
|
- [Summary](#summary) |
|
|
- [Model Description](#model-description) |
|
|
- [**Finetuning**](#finetuning) |
|
|
- [Evaluation](#evaluation) |
|
|
- [Installation](#installation) |
|
|
- [For Inference](#for-inference) |
|
|
- [Additional Information](#additional-information) |
|
|
- [Contact](#contact) |
|
|
- [License](#license) |
|
|
|
|
|
</details> |
|
|
|
|
|
## Summary |
|
|
|
|
|
The "stt_ca-es_conformer_transducer_large-rapnic-down" is an acoustic model based on ["projecte-aina/stt_ca-es_conformer_transducer_large"](https://huggingface.co/projecte-aina/stt_ca-es_conformer_transducer_large) suitable Catalan Automatic Speech Recognition for Down syndrome speech. |
|
|
At the same time, the latter was a model based on ["NVIDIA/stt_es_conformer_transducer_large"](https://huggingface.co/nvidia/stt_es_conformer_transducer_large/). |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model was created for Down Syndrome and transcribes in lowercase Catalan alphabet including spaces. |
|
|
It was mainly finetuned on audios from the Rapnic dataset, see [Rapnic Example](https://huggingface.co/datasets/CLiC-UB/rapnic-example) for more details on the dataset. |
|
|
See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#conformer-transducer) for complete architecture details. |
|
|
|
|
|
## **Finetuning** |
|
|
For this model, a full finetuning was performed on 70% of the data avaliable. |
|
|
To avoid training with poor data, we trained a preliminary model and used its WER results to filter speakers with a mean exceeding 80%. |
|
|
This resulted in better metrics for speakers under and over the WER threshold. |
|
|
### Evaluation |
|
|
To evaluate the model, we set appart 20% of the available data, making sure that no transcription was present in both training and testing sets. |
|
|
The WER results of performing inference in our test set (down only), filtering according to speaker mean WER thresholds, were the following: |
|
|
**** |
|
|
| Training ↓ / Evaluation → | 0.5 | 0.6 | 0.7 | 0.8 | None | |
|
|
|----------------------------|------|------|------|------|------| |
|
|
| 0.8 | 21.80 | 23.06 | 23.83 | 23.83 | 30.78 | |
|
|
|
|
|
|
|
|
## Installation |
|
|
|
|
|
To use this model, install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed the latest PyTorch version. |
|
|
``` |
|
|
pip install nemo_toolkit['all'] |
|
|
``` |
|
|
|
|
|
|
|
|
## For Inference |
|
|
To transcribe impaired speech in Catalan using this model, you can follow this example: |
|
|
|
|
|
|
|
|
```python |
|
|
import nemo.collections.asr as nemo_asr |
|
|
|
|
|
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model) |
|
|
transcription = nemo_asr_model.transcribe([audio_path])[0].text |
|
|
print(transcription) |
|
|
``` |
|
|
|
|
|
## Additional Information |
|
|
|
|
|
### Contact |
|
|
For further information, please send an email to <gr.clic@ub.edu>. |
|
|
|
|
|
### License |
|
|
|
|
|
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/deed.en) |