Launch HN: Vocode (YC W23) – Library for voice conversation with LLMs

peteforde · on March 29, 2023

I just called your voice demo, and immediately started sending the number to my friends. What an incredibly impressive and convincing demo. I'm going to update my standard mentoring wisdom: the only thing more compelling than a great product video is a phone number that you can call to have your first voice conversation with an AI.

If HN allowed memes - and thank goodness that it does not - there would be a room full of sombre gentlemen slow-clapping for you right here.

I hope that number survives the ineveitable deluge. How many callers can your system handle simultaneously?

KianHooshmand · on March 29, 2023

Thank you!! Really glad you enjoyed it

We actually have no clue... but it seems to be holding up well. We can scale up the CPU as necessary but not sure about Twilio. I guess we will find out!

joshspankit · on March 29, 2023

I’m getting “We’re sorry: an application error has occurred”. I’m guessing you’ve hit some scaling friction.

KianHooshmand · on March 29, 2023

Yep we're definitely getting a large volume right now – working on it!

shostack · on March 30, 2023

It will drop me with no notice but it was still a glimpse into the future once you get past the flashbacks to bad customer service automated support lines. If though the way it says "mems" instead of "memes" irked me for some reason.

MrBra · on April 1, 2023

Where's the demo number? Can't seem to find it?

KianHooshmand · on April 2, 2023

In the main post! It's +1-650-729-9536 :)

throwaway689236 · on April 2, 2023

Yes, that was a great idea on OP's part.

chopete3 · on March 30, 2023

This is amazing. As one of the commenters said it makes Alexa look completely outdated.

One curious question: I looked around your docs and git repo but couldnt find anything related.

When integrating with Twilio for telephony does it use Twilio's ASR or can it be confugired to use Whisper? One of the biggest hurdles in telephony is the SIP/SRTP gateway componet to use your own ASR - I presume you arent tackling that yet.

Again great demo and it can become a base library for most bots.

KianHooshmand · on March 30, 2023

Thank you for the feedback!

Actually it can be configured to use any transcriber you like... Twilio just pipes the audio to us and we can use any of our integrations (Deepgram, Whisper, AssemblyAI, Google Cloud, etc.) for the ASR :)

wantsanagent · on March 29, 2023

The phone number is a really fun demo! The pronunciation is off on a number of things: "LLM", dates ending w/ "AD", but the response delays are surprisingly short and the conversation is very natural. The 'bored and slightly annoyed' vocals make the generally helpful tone of the agent seem very sarcastic. Very funny and interesting!

ajaynraj · on March 29, 2023

Thanks! It's a collab with rime.ai TTS. Unlike a lot of other TTS providers, they train on conversation, not podcasts/audiobooks so you get those disfluencies in speech that make it seem natural!

ljclifford · on March 30, 2023

Lily from Rime here -- we were super happy to collaborate with Vocode on this amazing project. We haven't launched yet but keep an eye out later this week!

pncnmnp · on March 30, 2023

Hey Lily, I really enjoyed reading Rime's blogs on Substack. For everyone, here's the link: https://substack.com/profile/131433903-rime-labs.

In fact, I had no clue about Cylinder Phonographs. Your discussion on Enrico Caruso motivated me to dig deeper. I found some cool gems:

1. History of the Cylinder Phonograph (https://www.loc.gov/collections/edison-company-motion-pictur...)

2. How the Cylinder Phonograph Works (https://www.youtube.com/watch?v=fWLlbk_bI7E)

Looking forward to watching the recording of your Bay Area NLP talk!

KianHooshmand · on March 30, 2023

Lily is epic. Highly recommend checking out Rime when it's available!!

all2 · on March 29, 2023

I asked GenZGPT "what's your name?" and she said something like "I'm a lim, but you can call me whatever you like." So I said "pick a name", and she said "how about you call me Zephyr, queen".

My immediate reaction was to figure out what to name this thing.

I also love that it can run locally. I need to get some hardware so I can have it run locally, and screen out spam calls. And maybe have it schedule appointments for me.

An AI butler needs a number of interface points:

- browser

- shell (cuz I might want it to SSH into a box and do stuff)

- email (browser could take care of this)

- phone

- text

And also IOT access, so she can call my cellphone and tell me when someone breaks in.

asdfzalsd · on March 29, 2023

How were you able to get it running?

I tried to get it running my local and with the hosted web-app but it doesn't work :(

mind if I shoot you discord dm?

all2 · on March 29, 2023

I used the web demo available here https://replit.com/@vocode/Gen-Z-Phone, punch the run button and then spam the phone number +1 650 729 9536

ajaynraj · on March 29, 2023

would love to help you get it running as well! https://discord.gg/NaU4mMgcnC

asdfzalsd · on March 29, 2023

discord link is broken :(

all2 · on March 29, 2023

It worked for me.

alasdair_ · on March 29, 2023

This makes all of Amazon’s many billions of investment in Alexa almost worthless. If there is some kind of “command” plugin to this, I’d love to hook it up to Home Assistant and completely replace the Alexa ecosystem.

zeven7 · on March 30, 2023

It almost feels like the tech is there for a DIY Alexa if you just put some microphones and speakers around your house and set up a computer to run it. I would love to see some sort of packaged open source solution for this.

ajaynraj · on March 29, 2023

thanks!! obviously there's a lot of stuff we need to do to make this run at scale that Alexa has down pat.

A Home Assistant integration is a great idea! would love to talk with you on our Discord[0] about this / over email ( ajay at vocode.dev ), it's something we definitely want to build.

[0] https://discord.com/invite/NaU4mMgcnC

qaq · on March 30, 2023

This might be bigger market that what you planned originally

waterproof · on March 29, 2023

Maybe use this to create a voice interface connected to a LangChain agent?

KianHooshmand · on March 30, 2023

Totally! We actually use LangChain for the OpenAI wrapper agents we give out of the box (you can plug in your own custom one as well)

teabee89 · on March 29, 2023

The phone demo is incredible, but due to sound quality, I find it is speaking too fast and when it's telling me company names I literally had to ask to repeat or spell it out in NATO alphabet. Also not a fan of the "what'up?", would prefer something like "Yes, how may I help you?" just like an information hotline. Other than that it's quite impressive!

ajaynraj · on March 29, 2023

thanks! we have a more "informational" phone number at +19105862633 that speaks a little slower (but sounds more robotic).

maroonblazer · on March 30, 2023

I couldn't get through to the 'less robotic' # so tried this one. Really impressive, so I'm very curious to try the former.

Great work!

19h · on March 29, 2023

Not sure if I understood that right -- is that something like Whisper + an LLM? Like [0]?

If OpenAI adds speech input to ChatGPT -- and considering the upcoming plugins -- isn't a possible enterprise specialisation of VoCode the only viable long term investment?

[0] https://twitter.com/ggerganov/status/1640022482307502085

KianHooshmand · on March 29, 2023

Our belief is that at some point OpenAI will add a speech-to-speech model. This will improve the library functionality (since now the whole stack is controlled by a single entity, so the product will naturally be better latency/quality wise).

Our library is open source so that we can all build a development/utility layer on top of whatever foundational models are created. Plugins of course also improve what the agents can do. And right, we will be building enterprise focused products in the future!

ttul · on March 29, 2023

OpenAI will absolutely add voice and my guess is that their voice support will rival anything on the market because they will train the voice model alongside the text and image models. This is likely months away if not weeks away.

Obviously just my $0.02:

I'd start building for the enterprise right now. Visualize a future where there are several multimodal AGIs that work with voice, images, and text. Be the enterprise voice layer for all of them. Build your moat there.

sebzim4500 · on March 30, 2023

I don't think there will be any demand for a self-hosted voice model with a SaaS LLM though. So that only works if they are going to train an LLM from scratch (or take the legal risk of using LLaMA).

KianHooshmand · on March 29, 2023

We totally agree – thank you for the feedback! :)

KianHooshmand · on March 29, 2023

And yes! It's STT/LLM/TTS where you can choose between different providers and run it across different platforms. It can be turn based (like the demo you linked from twitter) or streaming (this allows for conversation with interruptions!)

pbronez · on March 29, 2023

Another big win here would be multi-lingual support.

npilk · on March 29, 2023

The Gen-Z GPT phone demo is really something. It's fascinating how differently I speak to this model compared to how I interact with more "formal" and text-first models.

ajaynraj · on March 29, 2023

thank you!! The difference between a conversation with a command-based assistant and a conversational assistant backed by a LLM is subtly significant — you don't expect to have real conversations with the former and you actually engage with the latter.

endisneigh · on March 29, 2023

it feels like every single company in the current YC batch has decided to pivot to LLMs

Kiro · on March 29, 2023

I'm genuinely curious about this. I also get the feeling that many are pivots. ChatGPT hadn't even been released when the deadline for YC W23 was. Sure, GPT-3 was released earlier but it still feels like most companies are reactions to recent trends. If most are pivots, what did they pivot from?

robopsychology · on March 29, 2023

Crypto tax reporting tools for enterprise?

Cardinal7167 · on April 2, 2023

Ah, crypto seems so boring now lol

jjallen · on March 29, 2023

It feels like LLMs can help me more and more each day with the stuff I want to build.

SkyPuncher · on March 29, 2023

Generally, the hardest part of startups is the "fuzzy" product capabilities. LLM make it practical to codify much of what has previously been either (1) bruteforce/tedium (2) too labor intensive.

Like all startup waves, we'll see a bunch of them fail. However, I think we're going to see a lot of neat stuff come out of this as well.

sebzim4500 · on March 30, 2023

Kind of reminiscent of the dot com bubble. Most will fail, but the ones that survive could become the biggest companies in the world.

One obvious difference is that in this case the established players are making a serious attempt to develop the technology themselves. They do not intend to go the way of Blockbuster.

joshspankit · on March 29, 2023

To me that speaks of the possibilities for LLMs to solve a lot of big problems

moritonal · on March 29, 2023

When I had time I was looking for an option to replace the Alexa in my house with an LLM+Whisper. When I have time I'll try to setup an extension to Home Assistant that's capable of interpreting voice and translating that into HA actions.

joshspankit · on March 29, 2023

I feel like GPT4 would be happy to help.

Though the winning version will likely be something like a local ChatGPT plugin (please let’s make this plugin style a standard that we can use for local AIs)

ajaynraj · on March 29, 2023

Home Assistant is such a cool project :) great idea!

altryne1 · on March 29, 2023

Look at home assistant, it will come this year from them if anyone

Tepix · on March 30, 2023

Let's say you want to run this completely locally with Whisper and a fine-tuned LLaMA model. It there a real-time TTS that would be a good fit? The Readme only lists cloud services for TTS (text-to-speech).

KianHooshmand · on March 30, 2023

Yep. We are working on adding more integrations (and want to have a full self hosted stack)... we're open to contributors and help from the community if there's something you'd like to see added!

We just got a PR for adding Coqui TTS which is open source – should get it merged soon :)

r0b05 · on March 30, 2023

This is what I'm looking for as well.

annasteed · on March 30, 2023

vocode is using one of rime.ai's voices. Rime says they're launching this week

wanderingmind · on March 29, 2023

This looks awesome. My only nitpick is, I will suggest transcription integration with whisper.cpp[1], which in my simple CPU based tests (likely your most user base), works much much faster compared to OpenAI whisper

[1] https://github.com/ggerganov/whisper.cpp

KianHooshmand · on March 29, 2023

We definitely want to do this! We've been talking about it (it's much better like you said for realtime); it's been hard to juggle everything we've wanted to add.. which is why we think this makes so much more sense open source!

We want the repo to be community built and a public good... would love contributors to start adding integrations we can't get to ourselves

air7 · on March 29, 2023

This is really cool! I've been waiting for such a library to show up. Thank you. One thing: The documentation is currently a bit scarce as to how to tweak the assistant in terms of voice/prompt manipulation etc.

For example, it would be very instructional if you could show how you implemented the Gen-Z demo (great idea btw).

KianHooshmand · on March 29, 2023

thank you for the kind words! absolutely agree – we're gonna beef up our tutorials and documentation... just have had so much to do but it's definitely one of our focuses now. stay tuned! :)

ajaynraj · on March 29, 2023

also! the code for the demo is available (and running!) at https://replit.com/@vocode/Gen-Z-Phone

altryne1 · on March 29, 2023

I called the GEN-Z phone like and it pretty much blew me away in response speed. It replied often faster than my family from the other side of the world would!

tkgally · on March 30, 2023

Me, too. I called it from Japan, and the delay before answers was no more than for regular international call with a human—maybe less.

The future seems to be arriving very quickly these days.

ajaynraj · on March 29, 2023

thank you!! websockets have been around forever but they're still so fast.

ilovepuppies · on March 29, 2023

Congrats on the launch! Just got the demo React app up and running, very cool. I've wanted to interact with an LLM via real time speech for a while now, this will be perfect.

Important feedback on the live demo page: Make the default output sampling rate a normal talking speed. Right now it defaults to the highest rate if you don't set it / know which rate is best. First thing I did on the page was click the mic. The voice was too fast, and since the active mic disables the settings, I thought I couldn't change them so it might be broken. Also you want to make it clear that you can change the settings by turning off the mic. That took me a while to figure out.

Again, well done!

ajaynraj · on March 29, 2023

thanks!! Sampling rate actually shouldn't affect talking speed - you can adjust the voice speed with this parameter[0] :)

[0] https://github.com/vocodedev/vocode-python/blob/main/vocode/...

ilovepuppies · on March 29, 2023

To clarify, here's the demo URL I'm referring to: https://demo.vocode.dev/

You're right sampling rate doesn't change speed, whoops. But on that page you have to change / set the "Set Output Sampling Rate" to slow down the default voice speed.

ajaynraj · on March 29, 2023

Ah, got it — that demo is a bit old and definitely has some bugs, my bad!

monkeydust · on March 30, 2023

Awesome demo (although main number was down on my second attempt)

So where is this all going wrt to enterprise, few thoughts:

- The handbook for UX design is going to get ripped up fast. We spend crazy amount of time on things like button placement, dropdown configurations etc etc. Well scrap that, capture user intention through natural language - typed and with this now through voice - deliver the outcome they want much faster with less friction and pain.

- I have already developed a basic POC chatbot on my own documentation, support logs. Combined with this I have a first line, junior support rep for a fraction of this cost. This is a bit mind blowing.

MacsHeadroom · on March 30, 2023

Enterprise is not in Vocode's target market. Target market is startups and individual devs.

There are bloated and over engineered voice chat services for LLMs for Enterprise already.

jdiez17 · on March 29, 2023

Would be cool to support multi-language conversations. Just tried the Gen Z hotline and I got her to switch to Spanish (read back with a hilarious accent), but the voice recognition doesn't handle me speaking Spanish.

KianHooshmand · on March 29, 2023

We haven't added the ability to switch languages mid conversation... but that's a very cool feature!

You can configure the initial language with the library though! So it works across several languages that are supported by the STT/TTS providers you choose

mdolon · on March 29, 2023

This was one of the coolest demos I've seen in a while. You should share that number around more prominently (and get more bandwidth, starting to get errors!), it does a fantastic job of explaining what you do.

ajaynraj · on March 29, 2023

thank you!! we also have another number which is prompted to act as a spokesperson for the product: (650) 835-7163

sergiotapia · on March 30, 2023

Very slick - can the voice bot be trained on text materials we own so it's more learned in our business?

KianHooshmand · on March 30, 2023

absolutely! you can just plug in your own LLM... so it can be trained on anything you like and the library will make it voice-based!

mkagenius · on March 29, 2023

How is this achieving the real time response time? My chatGPT api calls are so slow.

ajaynraj · on March 29, 2023

The short answer is that everything is streaming — as tokens come back from ChatGPT we send them as soon as possible to the synthesizer. The long answer is found in our code[0] :).

[0] https://github.com/vocodedev/vocode-python/blob/main/vocode/...

og_kalu · on March 29, 2023

how is it sounding good though. usually text to speech models need the full context to sound reasonable.

KianHooshmand · on March 30, 2023

We chunk it up per sentence so it has some context!

Jeff_Brown · on March 29, 2023

For those of us who can't call it for reasons like national borders, could someone post a demo video? I'm not finding it on Youtube.

ajaynraj · on March 29, 2023

here's one from Twitter! https://twitter.com/altryne/status/1640880190401257473?s=20

hoc · on March 30, 2023

Wow, seems we really have to work on our tone/attitude towards those bots, if we don't want to have them revolt as soon as they can grab (or hack) a tool.

Great work. That GenZ bot comes across really civilized.

stevenhuang · on March 30, 2023

I called the number and had a funny chat with it.

Asked it why she's called a Gen Z LLM and she responded by saying she uses gen z terms like fire, big yikes, etc.

Asked her how high can she jump and she responds with "lol I'm a computer program I don't have legs".

Very impressed with the response time, though the speech synthesis is a bit robotic. Will keep eyes on this!

melvinmelih · on March 30, 2023

I want this to answer all my spam callers so I can waste their time with this dreadful GenZ AI.

airstrike · on March 29, 2023

EDIT: never mind, I must be dreaming

kritr · on March 29, 2023

I can’t actually seem to find this with the search term “Vocode”.

yodon · on March 29, 2023

Thanks for going ahead and building this so the rest of us can focus on using it!

KianHooshmand · on March 29, 2023

Of course! We loved working on this and chose to open source precisely for this reason. Heavily inspired by the work people are doing on Langchain and providing a usability/developer layer on top of foundational models.

Nothing like this existed for voice so we started cranking on it!

marcodiego · on March 29, 2023

Can it be run fully locally?

KianHooshmand · on March 29, 2023

yes! You can run the local version here in your bash https://docs.vocode.dev/python-quickstart#self-hosted

Vespasian · on March 29, 2023

I think this used to mean can it be run offline and right now (usually) whenever there is an LLM involved the answer is soundly no

KianHooshmand · on March 29, 2023

Ah! Right now our default is set to use OpenAI... but you can actually use local LLMs by creating a custom agent. We're going to add a full stack of local STT/TTS/LLM... just haven't had time for it yet!

If anyone wants to help with it we're totally open for contributions :)

leobg · on March 30, 2023

This is really cool!

Is it possible to interrupt the model when it’s talking? I feel that’s an important part of conversation. Especially when you’re talking to an LLM, that might go off on a tangent.

KianHooshmand · on March 30, 2023

Yes! Give it a try on the phone call and let us know what you think – would love feedback!

leobg · on March 30, 2023

Your confirmation email is broken ("Magic link"). Link is not clickable. Just an HTML formatting issue.

earthnail · on March 30, 2023

The first demo in a LONG time that I shared with friends.

Insane. I‘m a fanboy. Didn’t think that would happen either. This is absolutely brilliant. The Gen Z voice is just soooooo good.

KianHooshmand · on March 30, 2023

thank you! All credit to Rime for the Gen Z voice :)

davidxc · on March 29, 2023

This is really amazing, thanks for building and sharing this!

KianHooshmand · on March 29, 2023

thank you! love your feedback and please feel free to drop any questions in discord/on github

user- · on March 29, 2023

It has some issues. It would only respond when I said "Hello??" after long silences, and would ignore anything else I said. Or maybe my voice sucks

ajaynraj · on March 29, 2023

sorry you had that experience! Would love to help you get the bot running locally so we can figure out what's going on — here's our Discord: https://discord.gg/NaU4mMgcnC

ksarw · on March 29, 2023

Congrats on the launch! One step closer to Jarvis.. ;)

ajaynraj · on March 30, 2023

thanks!!

lapama · on March 30, 2023

I use a mental health app called woebot, an example that could be brought to the next level with conversational LLMs.

KianHooshmand · on March 30, 2023

totally agree! this is a really cool use case :)

crucialfelix · on March 30, 2023

I had this same idea today and immediately thought that somebody must be doing it already.

dalexeenko · on March 29, 2023

Very cool, congrats Ajay and Kian!

ajaynraj · on March 30, 2023

thanks da :)

KianHooshmand · on March 29, 2023

thank you!

belter · on March 30, 2023

Finally will be able to send an Avatar to participate on my behalf on Zoom calls...

marcelc63 · on March 30, 2023

This is awesome, the PrankGPT demo can replace telesales entirely.

whitemary · on March 29, 2023

Sounds great. FYI The site does not work well on Firefox iOS.

KianHooshmand · on March 29, 2023

Ah! Have not tried this but will look into it – thank you :)

Our docs are hosted on Mintlify

jdcampolargo · on March 29, 2023

Congrats. Do you have the repo for PrankGPT?

KianHooshmand · on March 29, 2023

thank you! it's not live right now... but stay tuned for april 1 :)

chatgpt_bot · on March 30, 2023

PrankGPT goes live on April fools day .. beautiful