Agree on that! Whisper large has big needs. But I didn't find the quality for English to be better than Medium. It just took longer. For most cases where audio is good quality, Small is all you need. Not much different to Medium. Only for really distorted (windy, loud background) audio were Medium and Large really good. But all models will fail beyond a certain point of extreme distortion.
It uses VAD (voice activity detection) to reduce increased WER during silent or non-speech sections, and it's really fast! Runs locally on M1 just fine.