I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.
I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.
> I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.
Agree! Totally concur on this.
I made a Mac app that uses whisper to transcribe from audio or video files. Also adds in VAD for reducing Whisper hallucination during silent sections, and it's super fast. https://apps.apple.com/app/wisprnote/id1671480366
I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.
[1] https://github.com/ggerganov/whisper.cpp