Hacker News new | past | comments | ask | show | jobs | submit login
Coqui TTS: a deep learning toolkit for Text-to-Speech (github.com/coqui-ai)
94 points by tim-- on Aug 8, 2022 | hide | past | favorite | 13 comments



The TTS performance graph is interesting but it would be even better to add another dimension for comparing their resource consumption (code size, RAM, CPU usage/speed). For example, if "Windows Male" is https://en.wikipedia.org/wiki/Microsoft_text-to-speech_voice... then it's an offline-only synthesiser that is relatively small and fast, while the Google ones are probably massive neural models that are only available as a service. Yet their speech performance seems to be quite similar according to that chart.


This is pretty cool, I tried this, takes around 5 secs to generate the audio for a couple of sentences with my old 1080Ti.

I've been using Google TTS for generating audio for my reading list, this would be good time to build a simple api+worker wrapper around this and integrate into my app.


There are actually several quiet high quality options around at this point.

Mimic is another one

https://mimic.mycroft.ai/


Very Cool! If anyone is interested in what a coqui sounds like (https://www.youtube.com/watch?v=LZUOiZG84c0)

Anyone who has ever fallen asleep anywhere in Puerto Rico will probably be quite familiar.

I used Coqui TTS a few months ago to roll my own speech controlled desktop in an hour or so, very cool stuff.


Those damn frogs have killed the housing market in large parts of Hawaii. You just can't get to sleep with all that damn noise!

And that makes you a danger on the road when you're driving and you haven't been able to sleep for an entire freaking week, or maybe even a month or more.

Who would have thought so much damn noise could come from such a tiny frog!


Coincidentally I've just started playing around with Coqui TTS for training on my own experimental datasets. I was naive enough to think I could get it to run on Windows instead of Linux, I would suggest you save yourselves the time and start from Linux if you're giving it a go!


All 'deep learning' things eventually fail on Windows for me. I almost got an AI image model repo setup on windows but ONE package for indexing is only available on Linux.... oh well time to redo it all


What is currently the best open source toolkit to do TTS with your own voice?


Looks to be a continuation of Mozilla TTS[1]. I'm kinda surprised there's no mention unless you go back in the git history[2].

[1] https://github.com/mozilla/TTS [2] https://github.com/coqui-ai/TTS/tree/e9e07844b77a43fb0864354...


Has anyone done any vocoding/deep fakes using this? Appreciate any articles/tips you can share if so.


The repo itself has good tutorials under the notebooks/ folder to get started with training and generating synthesized voices. Check out "Tutorial_2_train_your_first_TTS_model" you could start there.

FYI the format that they expect in metadata.csv has changed over time, it used to be "filename|transcribed text" and now it expects "filename|speaker name|transcribed text" but that's not reflected in the notebook.


How is this different from TorToiSe TTS?


It is a lot faster, has more languages and models and can run inference on even CPU with many of their models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: