Company
Date Published
Author
The Anyscale Team
Word count
1038
Language
English
Hacker News points
None

Summary

Ray Data LLM provides APIs for offline batch inference with LLMs within existing Ray Data pipelines, while Ray Serve LLM offers APIs for deploying LLMs for online inference in Ray Serve applications. Both modules offer first-class integration for vLLM and OpenAI compatible endpoints, addressing common developer pains around batch inference, such as launching multiple online inference servers and proxying/load balancing utilities to maximize throughput. Ray Data LLM simplifies the usage of LLMs within existing data pipelines by providing a Processor object that can be called on a Ray Data Dataset, while Ray Serve LLM allows users to deploy multiple LLM models together with a familiar Ray Serve API, offering features like automatic scaling and load balancing, unified multi-node multi-model deployment, OpenAI compatibility, and composable multi-model LLM pipelines.