Lower latency, lower cost, more possibilities
AssemblyAI has introduced major improvements in their API's inference latency, making the majority of audio files complete within well under 45 seconds regardless of audio duration and with a Real-Time-Factor (RTF) as low as .008x. These advancements have been implemented without any compromise on accuracy, as evidenced by their Conformer-2 model achieving an industry-leading average Word Error Rate (WER) at approximately 6%. AssemblyAI has achieved this through intelligent mini batching, hardware parallelization and optimized serving infrastructure. This results in reduced pricing for both async ($0.37 per hour) and real-time ($0.47 per hour) speech-to-text models. The company also plans to release more updates over the next few months.
Company
AssemblyAI
Date published
Jan. 10, 2024
Author(s)
Ryan O'Connor
Word count
1008
Hacker News points
1
Language
English