This blog post discusses three methods of batch inference in Ray: low-level using Ray Actors, high-level using Ray Data streaming, and a combination of both. It explains how to parallelize batch inference on a NYC taxi data model using Ray 2.4, including creating replicas of the trained model as actors, feeding data into these actors in parallel, and retrieving inference results. The post also introduces the ActorPool utility, which simplifies task management, and highlights its unoptimized aspects. Additionally, it discusses how to use the Ray Data library for batch inference, which automates common performance optimizations such as dynamic autoscaling, automatic batching, and pipelining of data, parallelizing data fetching and preprocessing, and managing the actor pool used for inference. The post concludes that Ray Data offers a more expressive and intuitive API for batch inference at scale, while still being layered on top of the underlying Ray Core primitives.