Real-time Call Transcription Using IBM Watson and Python
Nexmo's WebSocket feature allows for real-time audio streaming from phone calls, enabling various applications such as two-way conversations with AI bots, sentiment analysis, or keyword tracking. To utilize this feature, speech recognition or transcription is required to convert the audio into text, a process that can be performed in real-time using AI platforms like IBM Watson. The connection between Nexmo and Watson involves a relay server due to their different interfaces, requiring the use of a WebSocket interface for seamless communication. By establishing a WebSocket connection with Watson, users can receive transcription messages with confidence scores, allowing them to refine or correct the output as needed. The process also includes handling incoming messages from Vonage, parsing audio parameters, and sending requests to Watson to initiate transcription. When the call ends, Nexmo closes the WebSocket connection, triggering an action to stop the transcription stream in Watson.
Company
Vonage
Date published
Nov. 5, 2020
Author(s)
Sam Machin
Word count
1434
Language
English
Hacker News points
None found.