Whisper-v3 Hallucinations on Real World Data
Whisper-v3, the latest version of OpenAI's automatic speech recognition (ASR) model, has been found to hallucinate more frequently than its predecessor, Whisper-v2, when tested on real-world data. The median Word Error Rate (WER) for Whisper-v3 is 53.4, while Whisper-v2 only has a median WER of 12.7. Users have reported hallucinations in languages like Japanese and Korean as well. The author of this text tested the model on various audio files and found that it performs well with edge cases but struggles with real-world data, leading to high error rates.
Company
Deepgram
Date published
Nov. 14, 2023
Author(s)
Jose Nicholas Francisco
Word count
1762
Language
English
Hacker News points
None found.