/plushcap/analysis/deepgram/exploring-whisper

Exploring OpenAI Whisper Speech Recognition

What's this blog post about?

OpenAI has released a new open-source ASR model called Whisper, along with a repository of tools that make it easy to try out. Users can experiment with various options for inference and observe how they impact results. The Whisper paper describes its complex decoding strategy, which includes several heuristics aimed at making transcription more reliable. These strategies are currently implemented in the code, resulting in slightly improved test results but slowing down inference by up to six times. By default, the Whisper CLI tool runs inference and decoding up to six times with different decoding strategies. Users can adjust these settings to improve performance for their specific data. The model also struggles with periods of non-speech, which could be addressed using a voice activity detection algorithm in parallel with Whisper or by adjusting the compression ratio threshold.

Company
Deepgram

Date published
Oct. 13, 2022

Author(s)
Julia Strout

Word count
1620

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.