Together AI has achieved a 90% faster BF16 training with NVIDIA Blackwell Platform and Together Kernel Collection. The team used advanced features like 5th-generation Tensor Cores, on-chip Tensor Memory, and peer CTA groups to develop custom FP8 kernels that run 1.8x faster than FlashAttention-3. This collaboration combines Together AI's kernel optimization expertise with NVIDIA's latest accelerated computing platform innovations, setting new benchmarks for AI training and inference efficiency. The company is deploying tens of thousands of NVIDIA HGX B200 servers and GB200 NVL72 rack-scale solutions to build and deploy the next generation of AI reasoning models and agents. To celebrate NVIDIA Blackwell's arrival, Together AI is offering an exclusive launch program that invites AI teams to apply for a free accelerated test drive of Together GPU Clusters powered by NVIDIA HGX B200 and NVIDIA GB200 NVL72.