Company
Date Published
Author
Ross O'Connell
Word count
1518
Language
English
Hacker News points
None

Summary

OpenAI's Whisper model claims impressive word error rates for many South Asian languages, including Tamil, Hindi, and Bengali. However, upon closer examination, it appears that these results are artificially inflated due to a bug in their text cleaning process. This issue affects not only Tamil but also other languages using related writing systems, impacting over one billion speakers worldwide. Deepgram's Tamil language model outperforms OpenAI Whisper's Tamil model in accuracy, speed, and cost of use when compared on a more even playing field.