Company
Date Published
Author
Ryan O'Connor
Word count
1714
Language
English
Hacker News points
None

Summary

We have successfully transcribed audio files into text using ESPnet's pretrained models for Automatic Speech Recognition (ASR). The process involved converting audio files to .wav format, if not already in that format, and then running them through the speech2text object. Preprocessing was also performed on the resulting transcriptions by removing punctuation and converting all text to uppercase using a helper function named "text_normalizer". The final transcriptions were compared with their corresponding true transcriptions for accuracy.