slseanwu/clotho-chatgpt-mixup-50K
Updated • 11 • 3
How to use espnet/DCASE23.AudioCaptioning.PreTrained with ESPnet:
from espnet2.bin.asr_inference import Speech2Text
model = Speech2Text.from_pretrained(
"espnet/DCASE23.AudioCaptioning.PreTrained"
)
speech, rate = soundfile.read("speech.wav")
text, *_ = model(speech)[0]