The World’s Fastest Voice Bot

Company

Daily

Date Published

June 26, 2024

Author

Kwindla Hultman Kramer

Word count

1373

Language

English

Hacker News points

None

URL

www.daily.co/blog/the-worlds-fastest-voice-bot

Summary

Speed is crucial for voice AI interfaces, with response times of 500ms being typical and anything longer than 800ms feeling unnatural. The key technical drivers to optimize for fast voice-to-voice response times are network architecture, AI model performance, and voice processing logic. Today's state-of-the-art components include WebRTC for sending audio from the user's device to the cloud, Deepgram's fast transcription models, Llama 3 70B or 8B, and Deepgram's Aura voice model. By self-hosting all three AI models together in the same Cerebrium container, it is possible to achieve median voice-to-voice response times as low as 500ms.