It’s insanely fast. Here’s an AI voice assistant I built that uses it: https://c...

joshhug · 2024-09-10T18:19:53.000000Z

That was interesting. I asked it to try to say something in another language, and she read it in a thick American accent. No surprise. Then I asked her to sing, and she said something like "asterisk in a robotic singing voice asterisk...", and then later explained that she's just text to speech. Ah, ok, that's about what I expected.

But then I asked her to integrate sin(x) * e^x and got this bizarre answer that started out as speech sounds but then degenerated into chaos. Out of curiosity, why and how did she end up generating samples that sounded rather unlike speech?

Here's a recording: https://youtu.be/wWhxF7ybiAc

FWIW, I can get this behavior pretty consistently if I chat with her a while about her voice capabilities and then go into a math question.

wenc · 2024-08-28T04:00:24.000000Z

This is pretty amazing. It's fast enough to converse with, and I can interrupt the model.

The underlying model is not voice trained -- she says things like "asterisk one" (reading out point form) -- but this is a great preview for when ChatGPT GAs their Voice Mode.

unraveller · 2024-08-28T04:27:55.000000Z

Fantastic demo. Do you know what's the difference between your stack and the livekit demo? [1] it shows your voice as text so you can see when you have to correct it.

Llama3 with ears just dropped (direct voice token input) which should be awesome with cerebras [2]

[1]: https://kitt.livekit.io [2]: https://homebrew.ltd/blog/llama3-just-got-ears

bilater · 2024-08-27T19:11:53.000000Z

Nice! What are the other pieces of the stack you'r using?

bcherry · 2024-08-27T21:58:49.000000Z

LiveKit, Cartesia, Deepgram, and Vercel

bilater · 2024-08-28T00:37:03.000000Z

awesome - might try it out

innagadadavida · 2024-08-28T06:06:38.000000Z

Oh wow, this is insanely good. Are there any model details?