Company
Date Published
Author
Ryan O'Connor
Word count
1008
Language
English
Hacker News points
1

Summary

AssemblyAI has introduced major improvements in their API's inference latency, making the majority of audio files complete within well under 45 seconds regardless of audio duration and with a Real-Time-Factor (RTF) as low as .008x. These advancements have been implemented without any compromise on accuracy, as evidenced by their Conformer-2 model achieving an industry-leading average Word Error Rate (WER) at approximately 6%. AssemblyAI has achieved this through intelligent mini batching, hardware parallelization and optimized serving infrastructure. This results in reduced pricing for both async ($0.37 per hour) and real-time ($0.47 per hour) speech-to-text models. The company also plans to release more updates over the next few months.