4 LLM frameworks to build AI apps with voice data
Company
AssemblyAI
Date published
March 12, 2024
Hi everyone, I'm Patrick from AssemblyAI and in this video I show you four frameworks that let you build AI applications with audio data and large language models. This allows a lot of cool use cases like you can summarize meetings, podcasts, videos, or ask any question you want about your data. Here I have a quick example. We have one MP3 file about sports injuries and then I can ask any question. For example, what is a runner's knee? This is a topic they talk about in the video and we get our response based on the transcript. Runner's knee is a condition characterized by pain behind or around the kneecap. So this works. And getting to work large language models with audio data requires a lot of different steps. That's why these frameworks come into play. So in this video I show you what steps are necessary, and then we have a look at the four different frameworks. So let's get started. So here I draw a rough overview of the steps that are required to build audio AI apps. First you have to always transcribe the audio because the language model cannot work with audio itself. So you have to transcribe the audio or video file and then have the text. Next you may want to store the transcript in a database so you can access it at any time. Later again, and don't have to recalculate the transcript. Then often you also need to split the text. So if you have a very long audio file then it doesn't fit into the context window of the model. So often you apply text splitters here and then also you can calculate embeddings and store this in a vector database. This would allow full blown rack applications. And by the way, we also have a tutorial about rack here. Next you may want to have some help with the prompt building and then finally you can apply your large language model. And then as last step of course you also want to deploy the application. So as you can see, there are a lot of steps required. So now let's look at the frameworks you can use for that. The first one is AssemblyAI Slimo framework that does all of those steps for you in only two to three lines of code and only one package you need to install. So technically it doesn't use a vector database under the hood, but it also allows you to input up to 100 hours of audio data and it works really well. And then we have Langchain, llama, index and haystack. So these frameworks give you maximum flexibility. There you combine multiple services and basically code all of these steps yourself, all of these frameworks are pretty cool, so let's have a look at some code examples and see them in action. First we have Lemore and you can install it with PIP install AssemblyAI. We also have it available in other languages, and you can also find this on GitHub. So this is our whole AssemblyAI Python SDK, which also includes Lemur. Then you can set up a transcriber and transcribe an audio file. This can be a URL or also a local file. And then by the way, you can also get this again at any time later again because we store the transcript for you and then you can ask any prompt you want by calling transcript limo task. Here we again ask what is a runner's knee? And then we print the result. So let's run this file and see if it works. And here we get the result explaining what is a runner's knee. So this is the simplest of all the frameworks with the least amount of code you need to write. I also recommend to just try it out in our playground. Here you can upload and transcribe any file for free, and then also play around with Lima for free. So here you can ask your questions. So have some fun with it and let me know in the comments if you like it. Next we use langchain and we can install this with pip install langchain. Then here we have to set up every service separately. So we also install langchain OpenAI to use ALm from OpenAI and then again AssemblyAI for the audio transcript loader. So as I said, transcribing the audio is always the first step and the integration with AssemblyAI makes it super simple to load this into langchain documents. You simply set up your AssemblyAI audio transcript loader and then you load this and then it's available in langchain documents. Then here we set up a simple prompt template and ask our question. And here we feed in the whole transcript into the prompt, so we don't include splitting the text and storing it. But this would also be very simple with langchain. Here I will link this article Q a with rag. So in this code example we keep it simple. Here we set up our model and then we invoke this and print this. So let's run the langchain example and see if it works. And we get our response from the OpenAI model. That looks a bit different, but also answers the question. A runner's knee is a condition characterized by pain behind or around the kneecap. So this is also pretty cool. Next let's have a look at llama index, so we can also install it with Pip. Pip install llama index. Then again we install a separate module for the LLM. So here llama index, LLM's OpenAI, and llama index reader's AssemblyAI. So here we also have an official integration with llama index, the AssemblyAI audio transcript reader. So we set this up and then load this. And now again we have this as a document, and here we even set up a vector store index and a query engine. So we kind of include the steps three and four. Llama index makes this super simple, with again only two lines of code, and then we can ask query engine query. And again our question. So let's run the third script and see if it works. And we get a response, again explaining what is a runner sneeze. So this time with llama index. Also very cool. And the last LLM framework we have a look at is haystack by deepset. So let's have a look at a code example here. Here we also have an official AssemblyAI transcriber integration that you can install with PIP. Then we import a pipeline, a prompt builder, an OpenAI generator, and the AssemblyAI transcriber. Then again we set this up. Then we again create our prompt template here. Again we keep it simple and feed in the entire transcript into the prompt. Then we generate our prompt builder and our OpenAI generator. Then here you set up your pipeline. So you create the pipeline, and then you add all the components you want to use. So here we want to use a transcriber component, then the prompt builder, and then the large language model. Then we connect all the fields, so the output of the transcriber goes into the prompt builder, and then the output of the prompt builder goes into the LLM. Then we define our question, and then we can run the whole pipeline and print the response. So let's also run this file, and this time the answer is short and concise, and it again explains what is a runner's knee. Of course you can tweak the model parameters if you also want a longer output here. So this was the code we had to write with Haystack. All right, these are the four frameworks you can use to build with LLMs and audio data. You can use lemur, Langchain, llama index, and Haystack. Let me know in the comments which one is your favorite, and if you want to have a more in depth rack tutorial, then check out this video, and then I hope to see you in the next one. Bye.