Building a RAG Batch Inference Pipeline with Anyscale and Union

Company

Anyscale

Date Published

Sept. 12, 2024

Author

Kevin Su and Kai-Hsun Chen

Word count

1665

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/anyscale-union-batch-inference-pipeline

Summary

This blog showcases the versatility of Ray, an open-source unified compute framework, by demonstrating embedding generation and LLM batch inference with Ray in two Flyte pipelines. Flyte is an open-source orchestrator that facilitates building production-grade data and machine learning pipelines. The blog also highlights the importance of a unified distributed computation framework like Ray and a workflow orchestrator like Flyte for managing AI/ML workloads. Anyscale, built by the creators of Ray, provides a seamless user experience for developers to deploy AI/ML workloads at scale, while Union, built by the technical founding team behind Flyte, abstracts away the infrastructure, providing a turnkey system that lets ML engineers and data scientists focus on their tasks. The blog then dives into two Flyte pipelines: one for generating embeddings using Ray Data and saving them to cloud storage shared by Union and Anyscale; and another for monitoring GitHub issues in Flyte repositories and using the Anyscale Platform to serve an LLM with RAG to perform batch inference and reply to the GitHub issues.