Company
Date Published
Oct. 6, 2023
Author
Ryan O'Connor
Word count
1260
Language
English
Hacker News points
None

Summary

In this tutorial, we learned how to use the AssemblyAI Python SDK to perform real-time transcription of audio streams using a WebSocket connection. We saw how easy it is to transcribe speech in real time and display the text on screen with just a few lines of code. Firstly, we installed the necessary dependencies for working with the Python SDK, including the `websockets` and `aai` libraries. Then, we defined two handler functions that would be called when receiving data from the WebSocket or an error occurred. In these handlers, we determined if the transcript was a final transcript (indicating the end of an utterance) or not, and printed it to the console accordingly. Next, we created another function for handling partial transcripts, which are sent while still processing audio data. These partial transcripts contain all previous words in the utterance, so by printing only the new text at the end of each partial transcript, we could make it appear as if only the delta (i.e., the new words) were being displayed since the last message. We then wrote an error handler that simply printed any errors that occurred during transcription. Finally, we created our main script code by instantiating a RealtimeTranscriber object and passing in our two handlers as well as the sample rate for the audio stream. We opened a WebSocket connection using this object's connect method, then opened a microphone stream to pass the audio data into the transcriber's stream method. Once done transcribing, we closed the WebSocket connection to clean up all loose ends before exiting the script. In addition, we saw how to define open and closing handlers for when the WebSocket is opened or closed. When these were added to our transcriber definition, they printed a message indicating when each event occurred. This tutorial provided an overview of real-time transcription using Python with the AssemblyAI SDK. By following along, you should now have a good understanding of how this process works and be able to implement similar functionality in your own projects.