ThunderKittens: A Simple Embedded DSL for AI kernels

Company

Together AI

Date Published

May 12, 2024

Author

Benjamin Spector, Aaryan Singhal, Simran Arora, Chris Re

Word count

659

Language

English

Hacker News points

None

URL

www.together.ai/blog/thunderkittens

Summary

We have developed a simple framework called ThunderKittens, an embedded Domain Specific Language (DSL) for AI kernels, aiming to make it easy to express key technical ideas in a clean and understandable code. The fundamental object of the DSL is a matrix that fits into tensor cores, which are 94% of the compute on an H100, to keep them busy. We've made the API PyTorch-like to be familiar to AI people, while still providing full power of the host and transparency in accelerators. ThunderKittens has been used by our team, resulting in impressive performance gains, including matching FA2 performance on 4090s and A100s, and being faster forward and backward than FA2 on H100s. The project is an art project, but we hope it makes key ideas clear and are open to feedback. We're releasing ThunderKittens now, integrated with Andrej's NanoGPT project, to make key concepts accessible to a wider audience.