The ThunderKittens framework, developed in collaboration with Stanford researchers, has been optimized for NVIDIA Blackwell GPUs. The new kernels allow for faster matrix multiplication and attention computations on the B200 architecture, leveraging features such as fifth-generation tensor cores, tensor memory, and CTA pairs. These optimizations enable better dataflow management, reduced bubbles in the pipeline, and increased throughput. By utilizing these new features, developers can write more efficient and performant GPU kernels, making it easier to quickly write high-performance code for various applications.