Hacker News new | past | comments | ask | show | jobs | submit login

I think it is more then a simple tts engine. At least from the demo, they showed: It can control the speed and it can sing when requested. Maybe its still a seperate speech engine, but more closely connected to the llm.



Azure Speech tts is capable of doing this with SSML. I wouldn't be surprised if it's what OpenAI is using on the backend.


Most impressive was the incredulity to the 'okay' during the counting demo after the nth interruption.

Was quickly apparent that text only is a poor medium for the variety and scope of signals that could be communicated by these multimodal networks.


tts with separate channels for style would do it, no?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: