license: cc-by-nc-4.0
language:
- ca
base_model:
- projecte-aina/stt_ca-es_conformer_transducer_large
tags:
- automatic-speech-recognition
- NeMo
model-index:
- name: stt_ca-es_conformer_transducer_large-rapnic-paralysis
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Rapnic (Test)
type: CLiC-UB/rapnic-example
split: test
args:
language: ca
metrics:
- name: WER
type: wer
value: 46.65
FFT for Cerebral Palsy: NVIDIA Conformer-Transducer Large (ca-es)
Table of Contents
Click to expand
****Summary
The "stt_ca-es_conformer_transducer_large-rapnic-paralysis" is an acoustic model based on "projecte-aina/stt_ca-es_conformer_transducer_large" suitable Catalan Automatic Speech Recognition for Cerebral palsy speech. At the same time, the latter was a model based on "NVIDIA/stt_es_conformer_transducer_large".
Model Description
This model was created for Cerebral Palsy and transcribes in lowercase Catalan alphabet including spaces. It was mainly finetuned on audios from the Rapnic dataset, see Rapnic Example for more details on the dataset. See the model architecture section and NeMo documentation for complete architecture details.
Finetuning
For this model, a full finetuning was performed on 70% of the data avaliable. To avoid training with poor data, we trained a preliminary model and used its WER results to filter speakers with a mean exceeding 80%. This resulted in better metrics for speakers under and over the WER threshold.
Evaluation
To evaluate the model, we set appart 20% of the available data, making sure that no transcription was present in both training and testing sets. The WER results of performing inference in our test set, filtering according to speaker mean WER thresholds, were the following:
| Training ↓ / Evaluation → | 0.5 | 0.6 | 0.7 | 0.8 | None |
|---|---|---|---|---|---|
| 0.8 | 36.22 | 38.60 | 41.40 | 42.78 | 46.65 |
Installation
To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.
pip install nemo_toolkit['all']
For Inference
To transcribe impaired speech in Catalan using this model, you can follow this example:
import nemo.collections.asr as nemo_asr
nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)
Additional Information
Contact
For further information, please send an email to gr.clic@ub.edu.