Ray Datasets is a data loading and preprocessing library built on top of the Ray framework, designed to simplify machine learning (ML) pipelines by providing a flexible and scalable API for working with data within Ray. It leverages Ray's task, actor, and object APIs to enable large-scale ML ingest, training, and inference, all within a single Python application. Datasets supports popular storage backends and file formats, common ML preprocessing operations, and works seamlessly with Ray-integrated libraries and ML frameworks such as TensorFlow and Torch. The library aims to be a universal parallel data loader, providing a narrow data waist for Ray applications and libraries to interface with. It offers convenient data preprocessing functionality, supports running stateful computations on GPUs, and enables batch inference on large datasets. With its robust distributed dataplane, Datasets delegates most of the heavy lifting to the Ray dataplane, focusing on higher-level features such as convenient APIs, data format support, and stage pipelining.