At Baseten, they've developed the fastest, most accurate, and cost-efficient Whisper transcription pipeline for production AI workloads, achieving over 1000x real-time factor and a word error rate of just 10.0 on the Rev16 benchmark. Their optimized pipeline uses a two-stage approach, chunking audio using voice activity detection to process longer files and remove unnecessary GPU processing. They've also implemented a custom hardware and scaling framework, Chains, to build multi-step inference pipelines that can be customized for optimal performance while keeping costs low. By optimizing Whisper transcription accuracy and speed, Baseten's pipeline is the most accurate and cost-efficient on the market, enabling users to reliably transcribe hours of audio in seconds.