Hacker News new | past | comments | ask | show | jobs | submit login

Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.



Any demos of this to listen to? It sounds potentially really good.


There is a demo in the link shared by OP.


That's really cool. Congratulations on the release!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: