Ray Compiled Graphs: Optimized AI Workloads with Native GPU Communication
Compiled Graphs is a new feature in Ray that offers significant improvements for large AI model workloads, such as training and inference. Unlike traditional CPU-based workloads, these tasks are overwhelmingly GPU-intensive and often require distributed computation across multiple accelerators. Compiled Graphs provide minimal task submission overhead compared to Ray's standard task submission overheads, enabling faster execution of sub-second workloads like auto-regressive token generation. Additionally, it supports native GPU to GPU transfer while automatically resolving deadlock and overlapping communication with computation. These improvements open up exciting new opportunities for Ray programs, such as reduced system overhead for repetitive task graphs, native support for GPU-GPU communication via NVIDIA NCCL, and optimized scheduling to avoid deadlock and best utilize compute and communication resources.
Company
Anyscale
Date published
Oct. 7, 2024
Author(s)
Sang Cho, Sam Chan and Stephanie Wang
Word count
1910
Language
English
Hacker News points
None found.