Company
Date Published
Author
Marius Killinger, Philip Kiely
Word count
816
Language
English
Hacker News points
None

Summary

GPU utilization is crucial for model inference as it directly affects the cost of serving high-traffic workloads. A high GPU utilization means fewer GPUs are needed, saving on costs. Measuring GPU utilization involves considering compute usage, memory usage, and memory bandwidth usage. Increasing batch sizes during inference can improve utilization by increasing throughput while managing trade-offs with latency. Switching to more powerful GPU types can also save costs. Tracking GPU utilization in the Baseten workspace provides insights into real-world usage effects on utilization.