Getting Started with ESPnet

Company

AssemblyAI

Date Published

June 6, 2022

Author

Ryan O'Connor

Word count

1714

Language

English

Hacker News points

None

URL

www.assemblyai.com/blog/getting-started-with-espnet

Summary

We have successfully transcribed audio files into text using ESPnet's pretrained models for Automatic Speech Recognition (ASR). The process involved converting audio files to .wav format, if not already in that format, and then running them through the speech2text object. Preprocessing was also performed on the resulting transcriptions by removing punctuation and converting all text to uppercase using a helper function named "text_normalizer". The final transcriptions were compared with their corresponding true transcriptions for accuracy.