Coqui TTS: a deep learning toolkit for Text-to-Speech

userbinator · on Aug 8, 2022

The TTS performance graph is interesting but it would be even better to add another dimension for comparing their resource consumption (code size, RAM, CPU usage/speed). For example, if "Windows Male" is https://en.wikipedia.org/wiki/Microsoft_text-to-speech_voice... then it's an offline-only synthesiser that is relatively small and fast, while the Google ones are probably massive neural models that are only available as a service. Yet their speech performance seems to be quite similar according to that chart.

snehesht · on Aug 8, 2022

This is pretty cool, I tried this, takes around 5 secs to generate the audio for a couple of sentences with my old 1080Ti.

I've been using Google TTS for generating audio for my reading list, this would be good time to build a simple api+worker wrapper around this and integrate into my app.

411111111111111 · on Aug 8, 2022

There are actually several quiet high quality options around at this point.

Mimic is another one

https://mimic.mycroft.ai/

boredumb · on Aug 8, 2022

Very Cool! If anyone is interested in what a coqui sounds like (https://www.youtube.com/watch?v=LZUOiZG84c0)

Anyone who has ever fallen asleep anywhere in Puerto Rico will probably be quite familiar.

I used Coqui TTS a few months ago to roll my own speech controlled desktop in an hour or so, very cool stuff.

bradknowles · on Aug 9, 2022

Those damn frogs have killed the housing market in large parts of Hawaii. You just can't get to sleep with all that damn noise!

And that makes you a danger on the road when you're driving and you haven't been able to sleep for an entire freaking week, or maybe even a month or more.

Who would have thought so much damn noise could come from such a tiny frog!

TheWellKnownEIP · on Aug 8, 2022

Coincidentally I've just started playing around with Coqui TTS for training on my own experimental datasets. I was naive enough to think I could get it to run on Windows instead of Linux, I would suggest you save yourselves the time and start from Linux if you're giving it a go!

pwillia7 · on Aug 8, 2022

All 'deep learning' things eventually fail on Windows for me. I almost got an AI image model repo setup on windows but ONE package for indexing is only available on Linux.... oh well time to redo it all

michelb · on Aug 8, 2022

What is currently the best open source toolkit to do TTS with your own voice?

Mindless2112 · on Aug 8, 2022

Looks to be a continuation of Mozilla TTS[1]. I'm kinda surprised there's no mention unless you go back in the git history[2].

[1] https://github.com/mozilla/TTS [2] https://github.com/coqui-ai/TTS/tree/e9e07844b77a43fb0864354...

pwillia7 · on Aug 8, 2022

Has anyone done any vocoding/deep fakes using this? Appreciate any articles/tips you can share if so.

TheWellKnownEIP · on Aug 8, 2022

The repo itself has good tutorials under the notebooks/ folder to get started with training and generating synthesized voices. Check out "Tutorial_2_train_your_first_TTS_model" you could start there.

FYI the format that they expect in metadata.csv has changed over time, it used to be "filename|transcribed text" and now it expects "filename|speaker name|transcribed text" but that's not reflected in the notebook.

IceHegel · on Aug 8, 2022

How is this different from TorToiSe TTS?

mazoza · on Aug 8, 2022

It is a lot faster, has more languages and models and can run inference on even CPU with many of their models.