Hacker News new | past | comments | ask | show | jobs | submit login

Compared to the first NaturalSpeech[1] I'm hearing a lot of white noise in the background. Singing is pretty cool but it feels like we need a few iterations before it can match the ground truth in the way speech does.

[1] https://speechresearch.github.io/naturalspeech/




Thanks for your interests in NaturalSpeech and NaturalSpeech 2!

NaturalSpeech focuses on synthesizing human-level high-quality speech, by training on a single-speaker recording-studio dataset.

NaturalSpeech 2 trains on 44K hours of multi-speaker in-the-wild datasets with more than 5K speakers and focuses on synthesizing any speaker's voice in a zero-shot way given only a short speech prompt. When the speech prompt is noisy in the background, NaturalSpeech 2 will mimic this noise as well. If you want clean voice, just give a clean speech prompt is OK.

Check more discussions on reddit as well: https://www.reddit.com/r/singularity/comments/12rubq4/latent...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: