Universal-2 vs OpenAI's Whisper: Comparing Speech-to-Text models in real-world use cases
This article compares the performance of four Speech-to-Text models - Universal-2, Universal-1, Whisper large-v3, and Whisper turbo - in real-world scenarios. The evaluation focuses on proper nouns, alphanumerics, text formatting, and hallucinations. Universal-2 outperforms the other models in most categories, showing significant improvements over its predecessor, Universal-1. It has the best overall accuracy (6.68% WER), superior proper noun handling (13.87% PNER), and best formatting accuracy (10.04% U-WER). Whisper large-v3 shows some notable strengths and limitations, with the best alphanumeric transcription accuracy (3.84% WER) but also a documented propensity for hallucinations. The article concludes that Universal-2 is the leading model in most categories, offering significant improvements over its predecessor and showing a 30% reduction in hallucination rates compared to Whisper large-v3.
Company
AssemblyAI
Date published
Nov. 7, 2024
Author(s)
Patrick Loeber
Word count
2446
Language
English
Hacker News points
None found.