Wav2Vec to Whisper to Nova-2: The evolution of AI & ASR

Company

Deepgram

Date Published

Oct. 20, 2023

Author

Ben Luks, Jose Nicholas Francisco

Word count

1637

Language

English

Hacker News points

None

URL

deepgram.com/learn/evolution-of-asr

Summary

The article discusses the evolution of AI and Automatic Speech Recognition (ASR) models from Wav2Vec 2.0 to Whisper and Nova-2. It highlights how pre-training has become a popular approach in Voice Technology, with large tech companies investing heavily in training models for Natural Language Processing tasks. The article compares the differences between Wav2Vec 2.0 and Whisper, noting that while both are pre-trained models, they have different architectures and approaches to training data. Whisper is a more customizable alternative to Wav2Vec 2.0, leveraging familiar architecture and finetuning processes. It aims to provide an easy-to-use Python package for users at various levels of abstraction. Nova-2, on the other hand, is more accurate, faster, and less expensive than Whisper, resulting from a decade's worth of iterations on patented AI architectures that deviate from the classic Transformer architecture. The article concludes by emphasizing the importance of understanding practical differences between technologies rather than getting overwhelmed by their minutiae in research contexts. It encourages users to test out Whisper and Nova-2 for themselves.