Company
Date Published
Dec. 12, 2024
Author
William Gao, Derrick Yang, Tianshu Cheng, Rachel Rapp
Word count
1145
Language
English
Hacker News points
None

Summary

At Baseten, they've developed the fastest, most accurate, and cost-efficient Whisper transcription pipeline for production AI workloads, achieving over 1000x real-time factor and a word error rate of just 10.0 on the Rev16 benchmark. Their optimized pipeline uses a two-stage approach, chunking audio using voice activity detection to process longer files and remove unnecessary GPU processing. They've also implemented a custom hardware and scaling framework, Chains, to build multi-step inference pipelines that can be customized for optimal performance while keeping costs low. By optimizing Whisper transcription accuracy and speed, Baseten's pipeline is the most accurate and cost-efficient on the market, enabling users to reliably transcribe hours of audio in seconds.