I think it is more then a simple tts engine. At least from the demo, they showed...

nabakin · 2024-05-21T03:19:24 1716261564

Azure Speech tts is capable of doing this with SSML. I wouldn't be surprised if it's what OpenAI is using on the backend.

kromem · 2024-05-21T02:56:42 1716260202

Most impressive was the incredulity to the 'okay' during the counting demo after the nth interruption.

Was quickly apparent that text only is a poor medium for the variety and scope of signals that could be communicated by these multimodal networks.

sooheon · 2024-05-21T00:54:35 1716252875

tts with separate channels for style would do it, no?