Building a RAG Batch Inference Pipeline with Anyscale and Union
This blog showcases the versatility of Ray, an open-source unified compute framework, by demonstrating embedding generation and LLM batch inference with Ray in two Flyte pipelines. Flyte is an open-source orchestrator that facilitates building production-grade data and machine learning pipelines. The blog also highlights the importance of a unified distributed computation framework like Ray and a workflow orchestrator like Flyte for managing AI/ML workloads. Anyscale, built by the creators of Ray, provides a seamless user experience for developers to deploy AI/ML workloads at scale, while Union, built by the technical founding team behind Flyte, abstracts away the infrastructure, providing a turnkey system that lets ML engineers and data scientists focus on their tasks. The blog then dives into two Flyte pipelines: one for generating embeddings using Ray Data and saving them to cloud storage shared by Union and Anyscale; and another for monitoring GitHub issues in Flyte repositories and using the Anyscale Platform to serve an LLM with RAG to perform batch inference and reply to the GitHub issues.
Company
Anyscale
Date published
Sept. 12, 2024
Author(s)
Kevin Su and Kai-Hsun Chen
Word count
1665
Language
English
Hacker News points
None found.