Comparing the world’s first voice-to-voice AI models

Company

Hume

Date Published

Sept. 11, 2024

Author

Jeremy Hadfield

Word count

1831

Language

English

Hacker News points

None

URL

www.hume.ai/blog/evi2-vs-gpt4ovoice

Summary

Voice-to-voice foundation models are the latest major breakthrough in AI, enabling users to speak with AI through voice alone. The world's first working voice-to-voice models are Hume AI's Empathic Voice Interface 2 (EVI 2) and OpenAI's GPT-4o Advanced Voice Mode (GPT-4o-voice). These systems have many capabilities in common, such as processing audio and language, outputting voice and language, and understanding a user's tone of voice. However, EVI 2 is optimized for emotional intelligence, maintaining compelling personalities, customization, and designed for developers, while GPT-4o-voice supports more languages. Voice-to-voice models are set to transform various sectors like customer service, mental health, education, and personal development by providing a more efficient interface for virtually any application.