/plushcap/analysis/deepgram/nova-2-speech-to-text-api

Introducing Nova-2: The Fastest, Most Accurate Speech-to-Text API

What's this blog post about?

Deepgram introduces Nova-2, a next-generation speech-to-text model that outperforms alternatives in terms of accuracy, speed, and cost. Nova-2 is 18% more accurate than its predecessor and offers a 36% relative WER improvement over OpenAI Whisper (large). It delivers an average 30% reduction in word error rate (WER) over competitors for both pre-recorded and real-time transcription, with 5-40x faster pre-recorded inference time. Nova-2 is priced at $0.0043/min for pre-recorded audio, making it more affordable than other full-functionality providers. The model has been trained on a diverse dataset and offers improved entity accuracy, punctuation accuracy, and capitalization error rate compared to Nova-1. Deepgram's benchmarking methodology uses over 50 hours of human-annotated audio across various domains and compares Nova-2 with other prominent models in the market.

Company
Deepgram

Date published
Sept. 19, 2023

Author(s)
Josh Fox

Word count
2281

Language
English

Hacker News points
2