Announcing Native LLM APIs in Ray Data and Ray Serve

Company

Anyscale

Date Published

April 2, 2025

Author

The Anyscale Team

Word count

1038

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/llm-apis-ray-data-serve

Summary

Ray Data LLM provides APIs for offline batch inference with LLMs within existing Ray Data pipelines, while Ray Serve LLM offers APIs for deploying LLMs for online inference in Ray Serve applications. Both modules offer first-class integration for vLLM and OpenAI compatible endpoints, addressing common developer pains around batch inference, such as launching multiple online inference servers and proxying/load balancing utilities to maximize throughput. Ray Data LLM simplifies the usage of LLMs within existing data pipelines by providing a Processor object that can be called on a Ray Data Dataset, while Ray Serve LLM allows users to deploy multiple LLM models together with a familiar Ray Serve API, offering features like automatic scaling and load balancing, unified multi-node multi-model deployment, OpenAI compatibility, and composable multi-model LLM pipelines.