Hacker News new | past | comments | ask | show | jobs | submit login

current leader in open source voice cloning is RVC, would like to see how it compares to it.



RVC is voice conversion (audio to audio), and it's typically finetuned.

This is zero shot TTS. Samples create vector encodings that serve as input to inference. There's no retraining the model unless you want it to generalize or perform better.


It isn't though, people need to read the paper and the comments from the author they aren't actually doing the voice generation they pass the text off to VITS, and then they're sauce is that they are doing tone mapping on that VITS output, so if anything they're a competitor to RVC, it's just that the version they published includes VITS also


Interesting.

Funny enough, a lot of RVC packages are using VITS to do RVC for TTS.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: