Scaling Model Batch Inference in Ray: Using Actors, ActorPool, and Ray Data

Company

Anyscale

Date Published

May 16, 2023

Author

Eric Liang, Jules S. Damji, Zhe Zhang

Word count

1856

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/scaling-model-batch-inference-in-ray-using-actors-actorpool-and-ray-data

Summary

This blog post discusses three methods of batch inference in Ray: low-level using Ray Actors, high-level using Ray Data streaming, and a combination of both. It explains how to parallelize batch inference on a NYC taxi data model using Ray 2.4, including creating replicas of the trained model as actors, feeding data into these actors in parallel, and retrieving inference results. The post also introduces the ActorPool utility, which simplifies task management, and highlights its unoptimized aspects. Additionally, it discusses how to use the Ray Data library for batch inference, which automates common performance optimizations such as dynamic autoscaling, automatic batching, and pipelining of data, parallelizing data fetching and preprocessing, and managing the actor pool used for inference. The post concludes that Ray Data offers a more expressive and intuitive API for batch inference at scale, while still being layered on top of the underlying Ray Core primitives.