The evolution of production machine learning architectures is similar to that of GPU programming architectures. The first generation was fixed function pipelines, the second generation introduced programmability within the pipeline, and the third generation aims to be fully programmable with a focus on libraries, making it easier for developers to use without worrying about the underlying architecture. Ray, a system being discussed as an example of a third-generation production ML architecture, has features such as simple and flexible framework for distributed computation, cloud-provider independent compute launcher/autoscaler, ecosystem of distributed computation libraries built with #1, and easy way to "glue" together distributed libraries in code that looks like normal python while still being performant. Ray is used by customers to simplify existing ML architectures, parallelize high performance ML systems, make machine learning accessible to non-specialists, and build highly scalable algorithms for deep reinforcement learning.