Generally Available: The fastest, most accurate, and cost-efficient Whisper transcription

Company

Baseten

Date Published

Dec. 12, 2024

Author

William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp

Word count

1145

Language

English

Hacker News points

None

URL

www.baseten.co/blog/the-fastest-most-accurate-and-cost-efficient-whisper-transcription

Summary

At Baseten, they've developed the fastest, most accurate, and cost-efficient Whisper transcription pipeline for production AI workloads, achieving over 1000x real-time factor and a word error rate of just 10.0 on the Rev16 benchmark. Their optimized pipeline uses a two-stage approach, chunking audio using voice activity detection to process longer files and remove unnecessary GPU processing. They've also implemented a custom hardware and scaling framework, Chains, to build multi-step inference pipelines that can be customized for optimal performance while keeping costs low. By optimizing Whisper transcription accuracy and speed, Baseten's pipeline is the most accurate and cost-efficient on the market, enabling users to reliably transcribe hours of audio in seconds.