/plushcap/analysis/assemblyai/assemblyai-comparing-universal-2-and-openai-whisper

Universal-2 vs OpenAI's Whisper: Comparing Speech-to-Text models in real-world use cases

What's this blog post about?

This article compares the performance of four Speech-to-Text models - Universal-2, Universal-1, Whisper large-v3, and Whisper turbo - in real-world scenarios. The evaluation focuses on proper nouns, alphanumerics, text formatting, and hallucinations. Universal-2 outperforms the other models in most categories, showing significant improvements over its predecessor, Universal-1. It has the best overall accuracy (6.68% WER), superior proper noun handling (13.87% PNER), and best formatting accuracy (10.04% U-WER). Whisper large-v3 shows some notable strengths and limitations, with the best alphanumeric transcription accuracy (3.84% WER) but also a documented propensity for hallucinations. The article concludes that Universal-2 is the leading model in most categories, offering significant improvements over its predecessor and showing a 30% reduction in hallucination rates compared to Whisper large-v3.

Company
AssemblyAI

Date published
Nov. 7, 2024

Author(s)
Patrick Loeber

Word count
2446

Language
English

Hacker News points
None found.