All You Need Is One GPU: Inference Benchmark for Stable Diffusion

Company

Lambda

Date Published

Oct. 5, 2022

Author

Eole Cervenka

Word count

1248

Language

English

Hacker News points

None

URL

lambda.ai/blog/inference-benchmark-stable-diffusion

Summary

We present an inference benchmark of Stable Diffusion on different GPUs and CPUs to shed light on the questions of what hardware is needed for running this state-of-the-art text-to-image model. The findings show that many consumer-grade GPUs can do a fine job, with the most powerful Ampere GPU (A100) being only 33% faster than the 3080 card when it comes to speed. However, A100 outperforms 3080 in terms of throughput by 2.5x. We also observe that half-precision reduces the time for generating a single output image by about 40% for Ampere GPUs and by 52% for the previous generation RTX8000 GPU. The increase is not linear, and the tensor cores on the GPU are saturated when batch size reaches a certain value. Removing autocast speeds up inference with pytorch at half-precision by ~25%. We verify performance gains both on speed and memory usage side. Our observation is that there are indeed visible differences between single-precision output and half-precision output, especially in early steps.