The NVIDIA GH200 Grace Hopper Superchip is a unique architecture that combines an NVIDIA Hopper GPU with an ARM CPU via NVLink-C2C, promising advantages for AI inference workloads requiring large KV cache allocations. The GH200's high-speed interconnect allows offloading parts of the KV cache to abundant CPU memory, unlocking optimizations like prefix caching and KV cache re-use. In experiments serving Llama 3.3 70B on a single 96GB GH200 GPU, the superchip outperformed an H100 GPU by 32%, with performance gains coming from access to a larger KV cache rather than just higher VRAM bandwidth or identical compute profiles. The results suggest that the GH200 Superchip is well-suited for high-throughput deployments of models that wouldn't fit on standalone GPUs with similar VRAM profiles, and its unique architecture powers the GB200 Grace Blackwell Superchip, which promises to be extremely powerful for model inference and supports multi-node NVLink for serving larger models.