Getting Started with ESPnet
We have successfully transcribed audio files into text using ESPnet's pretrained models for Automatic Speech Recognition (ASR). The process involved converting audio files to .wav format, if not already in that format, and then running them through the speech2text object. Preprocessing was also performed on the resulting transcriptions by removing punctuation and converting all text to uppercase using a helper function named "text_normalizer". The final transcriptions were compared with their corresponding true transcriptions for accuracy.
Company
AssemblyAI
Date published
June 6, 2022
Author(s)
Ryan O'Connor
Word count
1714
Language
English
Hacker News points
None found.