MauroVazquez's picture
Update README.md
1492467
metadata
license: cc-by-nc-4.0
language:
  - ca
base_model:
  - projecte-aina/stt_ca-es_conformer_transducer_large
tags:
  - automatic-speech-recognition
  - NeMo
model-index:
  - name: stt_ca-es_conformer_transducer_large-rapnic-paralysis
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Rapnic (Test)
          type: CLiC-UB/rapnic-example
          split: test
          args:
            language: ca
        metrics:
          - name: WER
            type: wer
            value: 46.65

FFT for Cerebral Palsy: NVIDIA Conformer-Transducer Large (ca-es)

Table of Contents

Click to expand****

Summary

The "stt_ca-es_conformer_transducer_large-rapnic-paralysis" is an acoustic model based on "projecte-aina/stt_ca-es_conformer_transducer_large" suitable Catalan Automatic Speech Recognition for Cerebral palsy speech. At the same time, the latter was a model based on "NVIDIA/stt_es_conformer_transducer_large".

Model Description

This model was created for Cerebral Palsy and transcribes in lowercase Catalan alphabet including spaces. It was mainly finetuned on audios from the Rapnic dataset, see Rapnic Example for more details on the dataset. See the model architecture section and NeMo documentation for complete architecture details.

Finetuning

For this model, a full finetuning was performed on 70% of the data avaliable. To avoid training with poor data, we trained a preliminary model and used its WER results to filter speakers with a mean exceeding 80%. This resulted in better metrics for speakers under and over the WER threshold.

Evaluation

To evaluate the model, we set appart 20% of the available data, making sure that no transcription was present in both training and testing sets. The WER results of performing inference in our test set, filtering according to speaker mean WER thresholds, were the following:


Training ↓ / Evaluation → 0.5 0.6 0.7 0.8 None
0.8 36.22 38.60 41.40 42.78 46.65

Installation

To use this model, install NVIDIA NeMo. We recommend you install it after you've installed the latest PyTorch version.

pip install nemo_toolkit['all']

For Inference

To transcribe impaired speech in Catalan using this model, you can follow this example:

import nemo.collections.asr as nemo_asr

nemo_asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(model)
transcription = nemo_asr_model.transcribe([audio_path])[0].text
print(transcription)

Additional Information

Contact

For further information, please send an email to gr.clic@ub.edu.

License

CC BY-NC 4.0