Update README.md

29b208c verified over 1 year ago

4.57 kB

	---
	language:
	- en
	datasets:
	- mozilla-foundation/common_voice_13_0
	- facebook/voxpopuli
	- LIUM/tedlium
	- librispeech_asr
	- fisher_corpus
	- WSJ-0
	metrics:
	- wer
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: tbd
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: LibriSpeech (clean)
	type: librispeech_asr
	config: clean
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 2.5
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: LibriSpeech (other)
	type: librispeech_asr
	config: other
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 5.6
	name: Test WER
	- task:
	type: Automatic Speech Recognition
	name: automatic-speech-recognition
	dataset:
	name: tedlium-v3
	type: LIUM/tedlium
	config: release1
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 6.3
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Vox Populi
	type: facebook/voxpopuli
	config: en
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 7.3
	name: Test WER
	- task:
	type: Automatic Speech Recognition
	name: automatic-speech-recognition
	dataset:
	name: Mozilla Common Voice 13.0
	type: mozilla-foundation/common_voice_13_0
	config: en
	split: test
	args:
	language: en
	metrics:
	- type: wer
	value: 12.1
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: FLEURS
	type: google/fleurs
	split: test
	args:
	language: en_us
	metrics:
	- type: wer
	value: 6.8
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Switchboard
	type: unk
	split: eval2000
	args:
	language: en
	metrics:
	- type: wer
	value: 6.8
	name: Test WER
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: Wall Street Journal
	type: unk
	split: eval92
	args:
	language: en
	metrics:
	- type: wer
	value: 1.3
	name: Test WER
	---
	# DeCRED-base
	This is a 174M encoder-decoder Ebranchformer model trained with an decoder-centric regularization technique on 6,000 hours of open-source normalised English data.
	It achieves Word Error Rates (WERs) comparable to `openai/whisper-medium` across multiple datasets with just 1/4 of the parameters.

	Architecture details, training hyperparameters, and a description of the proposed technique will be added soon.

	Disclaimer: The model currently produce insertions on utterances containing silence only, as it was previously not trained on such data. The fix will be added soon.

	The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
	class to transcribe audio files of arbitrary length.

	```python
	from transformers import pipeline

	model_id = "BUT-FIT/DeCRED-base"
	pipe = pipeline("automatic-speech-recognition", model=model_id, feature_extractor=model_id, trust_remote_code=True)
	# In newer versions of transformers (>4.31.0), there is a bug in the pipeline inference type.
	# The warning can be ignored.
	pipe.type = "seq2seq"

	# Run beam search decoding with joint CTC-attention scorer
	result_beam = pipe("audio.wav")

	# Run greedy decoding without joint CTC-attention scorer
	pipe.model.generation_config.ctc_weight = 0.0
	pipe.model.generation_config.num_beams = 1

	result_greedy = pipe("audio.wav")

	```
	## Citation
	If you use [DeCRED](https://arxiv.org/abs/2410.17437) in your research, please cite the following paper:

	```bibtex
	@misc{polok2024improvingautomaticspeechrecognition,
	title={Improving Automatic Speech Recognition with Decoder-Centric Regularisation in Encoder-Decoder Models},
	author={Alexander Polok and Santosh Kesiraju and Karel Beneš and Lukáš Burget and Jan Černocký},
	year={2024},
	eprint={2410.17437},
	archivePrefix={arXiv},
	primaryClass={eess.AS},
	url={https://arxiv.org/abs/2410.17437},
	}
	```