Hacker News new | past | comments | ask | show | jobs | submit login

As someone working on singing synthesis, I know how hard it is to get that last 10% quality that makes a human listener instantly recognise if the voice is real or generated.

These are really impressive results! For anyone interested, my team’s singing work: https://youtu.be/LPy20zSWhZA)




If you are going to have such an intensive particle effect in your videos at least bother to upload a 4k version so there is a tiny chance that not every single frame consists of nothing but artifacts.

Also don't put gumi and English in the same search query on YouTube. I don't know how they did it but the voices from six years ago sound better than SOTA TTS based on deep learning today...


Clearly the point of the video is its AUDIO content, not the visuals. The lack of a "4k version" does not make any difference other than saving you bandwith :-)


Very well done! Any suggestions on where/how one might learn to do something similar? I love the idea of being able to swap singers on a given track




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: