Hacker News new | past | comments | ask | show | jobs | submit login

Whisper is an STT model, you can use whisperx to transcribe audios locally via the CLI or whisper-turbo.com that runs in the browser.

For TTS coqui has the best UX and models for a lot of languages although quality is not on par with commercial TTS providers.




I've just been looking for SOTA TTS. I found coqui.ai and elevenlabs.io (and a bunch of others). They're good (and better than older TTS), but I am not fooled by any of them. Do you have recommendations?


Gemelo was the other one listed. I doubt you'll get anything sounding more natural than ElevenLabs with the following settings:

* Model: Multilingual v2

* All options and sliders to boost similarity: set to max/yes

* Stability slider: experimentally set to a value where the model sounds natural enough without destabilising sound output




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: