How to Run OpenAI’s Whisper Speech Recognition Model

Company

AssemblyAI

Date Published

Sept. 22, 2022

Author

Ryan O'Connor

Word count

3409

Language

English

Hacker News points

None

URL

www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model

Summary

The Micro Machines example was transcribed with Whisper on both CPU and GPU at each model size, and the inference times are reported below. First, we see the results for CPU (i5-11300H) ``` Tiny: 0.02 sec Base: 0.06 sec Small: 0.14 sec Medium: 0.39 sec Large: 1.47 sec ``` Next, we have the results on GPU (high RAM GPU Colab environment) ``` Tiny: 0.01 sec Base: 0.02 sec Small: 0.05 sec Medium: 0.13 sec Large: 0.46 sec ``` Here are the same results side-by-side ``` CPU GPU Tiny: 0.02 sec 0.01 sec Base: 0.06 sec 0.02 sec Small: 0.14 sec 0.05 sec Medium: 0.39 sec 0.13 sec Large: 1.47 sec 0.46 sec ``` The cost to run Whisper is as follows, using different batch sizes (values of which can be found in the legend): ``` Tiny: 0.03 USD/h Base: 0.09 USD/h Small: 0.21 USD/h Medium: 0.57 USD/h Large: 2.28 USD/h ```