can't you do whisper stuff for free already?

travisjungroth · on March 1, 2023

You can. You're just paying for compute and having it managed. Here's price estimates for 1,000 hours of audio on GCP: https://www.assemblyai.com/blog/how-to-run-openais-whisper-s...

For reference, from OpenAI it would be $360 and it's the large-v2 model.

minimaxir · on March 1, 2023

Whisper large is a bit tricker to self-host, and the faster inference may be useful for certain applications.

graderjs · on March 2, 2023

Agree on that! Whisper large has big needs. But I didn't find the quality for English to be better than Medium. It just took longer. For most cases where audio is good quality, Small is all you need. Not much different to Medium. Only for really distorted (windy, loud background) audio were Medium and Large really good. But all models will fail beyond a certain point of extreme distortion.

If you don't believe me or want to know more check out my free app that uses Whisper Small, and (Whisper Tiny for Turbo mode): https://apps.apple.com/app/wisprnote/id1671480366

It uses VAD (voice activity detection) to reduce increased WER during silent or non-speech sections, and it's really fast! Runs locally on M1 just fine.