Hacker News new | past | comments | ask | show | jobs | submit login
Project Alice – an open source virtual assistant that can run offline (github.com/project-alice-assistant)
169 points by jka on May 1, 2022 | hide | past | favorite | 78 comments



Offline is the key here. I wrote a short story not so long ago [1] about the educational potential of near-future world where storage gets so dense and cheap that the tide turns back toward local computing, away from the trend of the past 10 years toward "online cloud". What happens when the corpus of all human knowledge really can sit on your table top as the early sci-fi writers predicted?

[1] https://www.timeshighereducation.com/opinion/2048-informatio...


> What happens when the corpus of all human knowledge really can sit on your table top as the early sci-fi writers predicted?

If reality is any indication, then if it's copyrightable it won't be sitting on your table top:

* Set of important facts about the world: not copyrightable. Yet so freely accessible I don't even bother to copy Wikipedia to my table top.

* Set of important modern books: copyrightable and mostly still under copyright. Also, probably digitized by Google. Not available for table top copying.

* Set of important music scores: copyrightable, but most are by dead composers, and most are out of copyright. Hooray! Except the IMLSP had to go offline for almost a year, seek legal help and create a staff-review process for uploads to make absolutely sure no publishers could sue him out of existence.

* That full video of Will Forte playing white supremacist "Hamilton Whiteman" at Seth Meyer's actual rehearsal dinner: copyrightable. I actually watched the whole thing at one point, but it apparently got a copyright claim so I can't share it with anyone. I just added this one on the off chance someone on here has a copy and feels daring, because dammit, it was funny, and I want to watch it again.

I feel like the tech world needs a decent political mover-and-shaker who could make a Glengarry Glen Ross style speech at this point: something about how with the tools you got available he could go out and make legal table top human knowledge boxes a reality tonight...


> If reality is any indication, then if it's copyrightable it won't be sitting on your table top:

Absolutely spot on. Indeed, the story was an oblique advocacy for SciHub and an argument as to how the digital entertainment industry and publishers may eventually be the greatest obstacle to human progress.


Consider Cyc [1].

[1] https://en.wikipedia.org/wiki/Cyc

It deduces relations from set of facts in it's database.

If you have human knowledge from wikipedia on your desktop, you can deduce your own facts using Cyc's engine or something else.

It will be fun to tinker with. You can even start your own conspiracy theory, because you can.


> * That full video of Will Forte playing white supremacist "Hamilton Whiteman" at Seth Meyer's actual rehearsal dinner: copyrightable. I actually watched the whole thing at one point, but it apparently got a copyright claim so I can't share it with anyone. I just added this one on the off chance someone on here has a copy and feels daring, because dammit, it was funny, and I want to watch it again.

Was the thing you're looking for longer than 2 minutes, because the DDG results page is filled with those 2 minute videos, including the top result is one thrown to by Seth himself: https://duckduckgo.com/?q=will%20forte%20Hamilton%20seth


The set of all human knowledge doesn't seem to be accessible anywhere right now; it's not on the open internet at least.

Try "how do you make an entire digital camera starting with enough of the right kinds of sand?". Some of this is a trade secret, some of it is in other languages, some of it can't be explained by any person in the production process because you can't describe things you only know through muscle memory.


Really interesting, it makes me think about an AI that understands all the rules of physics and you can ask it questions the way GPT3 does, “design me an f1.4 lens that is low profile and projects light onto a curved 1inch sensor from X feet” or “how do I design a magnetic field to contain plasma at a maximum density with power input below X watts/h”. We really do stand on the edge of a very strange set of possibilities happening…


>the educational potential of near-future world where storage gets so dense and cheap that the tide turns back toward local computing, away from the trend of the past 10 years toward "online cloud"

I'd argue that storage is already dense and cheap enough to "turn the tide back toward local computing."

The reasons why the cloud (more precisely: someone else's servers) and all manner of non-local computing and storage became so entrenched was (at least in the US) that ISPs mostly offered (and still do) asymmetric internet links for "consumer" plans, refuse to provide static IP addressess (IPv4 and IPv6) and actively discourage (via port blocking/throttling) running servers on such links.

And when download/upload bandwidth ratios (for my links they're 20 and 17) are so skewed towards downloads, hosting content becomes less viable.

Add poorly managed physical plant and resistance to spending money on upgrades, and you have unreliable links with inadequate bandwidth.

After decades of this, is it any wonder that centralized storage/compute/streaming services found a ready niche, and have exploited the anti-consumer/anti-competitive actions and policies of the big ISPs to the point that many folks don't realize it could be any other way.

Want to move back toward local computing? Make decently fast, symmetric internet links the norm. Getting that done (at least in the US) will be a tall order.

And even if/when we get there, such a shift won't happen all at once, especially since centralized services will fight tooth and nail to keep their customers (for paid services) and product (for "free" services) at our expense.

However, once decently fast symmetric bandwidth is the norm for "consumer" internet links, decentralization will follow.

It would likely take a while, especially since many (most?) of the developers of decentralized solutions (I'm looking at you, diaspora, mastodon and matrix, among others) make self-hosting their software inaccessible to non-technical folks[0].

There are many great reasons to move away from centralized compute/storage/streaming services, but there are significant barriers to doing so, at multiple levels of the network infrastructure. And more's the pity.

[0] https://news.ycombinator.com/item?id=30783477


I don't believe storage is the main limiting factor here - availability is. And whether you want to make your data available outside.


So imagine an IPv6 based world where every household has rock solid high speed networking capability, and a relatively smart firewall deciding what connections to allow and disallow without heavy handed management work.


The corpus of knowledge is forever growing and it grows faster than many people's internet connection.


That sounds like a truism, but actually I am not sure it follows. You may be thinking of data, but knowledge is different.

Consider the ordering of:

Signals (semantic/non-entropy (changes that have meaning))

Data (interpretable signals)

Information (data that informs)

Useful Information

Knowledge (justifiable, true and useful information in context)

Wisdom (knowledge applicable to human survival and evolution)

Then, given Sturgeons Law that 99% of everything is rubbish at each level then "knowledge" is not growing as fast as you might think (and much human knowledge is lost at a terrible rate). Humans presently generate about 2-5 Exabytes of raw data per year. Only a small fraction of that is information, considerably less is useful information, and some fraction of that - in the form of good quality books, journals, papers etc, encoded without redundancy, is probably in the order of 100s of gigabytes per year. For over 70 years a shelf of encyclopedias was considered approximate summary of essential knowledge. It's not implausible that storage growth could overtake knowledge production at some point in the not to distance future.


>Then, given Sturgeons Law that 99% of everything is rubbish at each level then "knowledge" is not growing as fast as you might think

Sturgeon's law was 90%, also that's a mighty big given to entrust to a snarky Science Fiction writer's quip explaining why so much Science Fiction in his time sucked.


It already can if it’s mostly text. Have you checked hard drive prices lately?


Not ALL of human knowledge - and many(most?) might cringe to consider it so, but wikipedia english text only was 8GB or so in 2013 when I put it onto an ereader. Was handy to be able to read about the places as I travelled to them without internet connectivity.

Might be more now but storage has likely outpaced it in growth.


I realize it's used as a tongue in cheek reference to Resident Evil, but the "Umbrella Logo" is, among other things, trademarked[1][2][3].

I'm not sure about corporate ownership status here, but it would appear that someone's idea of movie merchandise was to sell firearms by registering a company called "Umbrella Corporation Weapons Research Group" in the real world. I could find several "licensed" arms in online stores. Possibly they're doing that to protect their IP, not out of any profit motive.

The name of the project won't be a problem, but you might want to steer clear of that particular logo. Entertainment companies can be very touchy about their IP and are known to be litigious.

[1] https://trademarks.justia.com/owners/umbrella-corporation-we...

[2] https://trademarks.justia.com/owners/umbrella-corp-4226588/

[3] Possibly more, but there's thousands of logos labeled "Umbrella" and I don't want to spend all day looking.


true words, thank you. While the projects logo/name exists longer than I'm in the project, I had my thoughts about it as well.

I came to the conclusion, there are so many car stickers/unlicensed merch or even companies with a similar logo, there shouldn't be an issue for a small open source project. Maybe a dangerous conclusion, that should be revised when Alice grows :)


I'm not sure I'd express it so harshly as TingPing. I like the project and hope it succeeds. Still, seems somehow unwholesome and invites unnecessary controversy. A good logo represents the project or company, and so what does it say that you all are associated with a erm... borrowed logo?


That attitude doesn't reflect well on the project IMO. Very short-sighted and unprofessional.


Why would they need to be professional? It's an open source hobby project. People are allowed to code for fun.


Hey if you want it, I threw together an umbrella inspired logo that might be a good alternative. It's close, but should be different enough to be distinguishable.

https://i.imgur.com/aPMBndB.png

CC0 and all that of course.


Thank you, my first thought was a mixture of umbrella corp and the scanner drones from half life 2 :)

I put it up for discussion. We don't want to cling to our current design, but we aren't sure yet about a good way to approach a redesigns. Needs a bit time to decide what aspects of Alice in total should be redesigned


Now that you mention it, it does remind me of the city scanner too haha

Well if you do go for it or want to make any tweaks to it I've now uploaded the svg source too: https://drive.google.com/file/d/1FIICC0ySnwzBB5NdWcDX3uXDdwO...

I hope you find something that fits :)


are you… also using the battle.net logo on the skill store? why

you’re like actively looking for trouble lol


Appears to be included in font-awesome, which warns that it is trademarked: https://fontawesome.com/icons/battle-net?s=brands

Someone probably just scrolled through available icons and ended up picking that one.


This - plus the skills icons can be chosen freely from font-awesome in skill creation. Maybe we should exclude fa-brand space from the allowed icons. Another good hint, thank you! And yep, the team dearly misses some creative heads for design as everybody involved is from the coding area :)


Because you can use whatever FontAwesome (for which I'm an early backer) icon when you are creating a skill for Alice using our backend skill creator on her browser interface. We could exclude the use of fa-brands.

Edit https://www.blizzard.com/fr-fr/legal/8bcb0794-6641-4ce3-a573...



How good is the speech-to-text in practice? I have found that opensource s2t models are extremely far from Google/Apple ones unfortunately, and make a complete assistant really frustrating to use.


> This means a Raspberry Pi and some other platforms such as the Respeaker core or the Matrix Creator is the best choices for a hardware platform. As for which Raspberry Pi, a raspberry 3, 3b, 3b+, 3A+ or 4 for the main unit are good choices.

Extrapolating on the expected hardware, it's likely going to be either inaccurate, have very high latency, or both.

Even Mycroft which open sourced everything else still runs the voice recognition part in the cloud because it's not possible to host it locally without rather beefy hardware that the average user won't have around (and even that has high latency). I did try some offline speech to text alternatives for it but they all more or less sucked to be frank.


Google Assistant and Siri have offline functionality so it must at least be possible, but perhaps needs a company's worth of machine learning thrown at it


Ok that is genuinely surprising, perhaps the open source STTs are genuinely all like a decade behind the cutting edge.


Alice offers a few different STT options. While you can stay completely offline, depending on the language and your background noise, indeed the results are not as perfect as the cloud ones. But especially coqui STT (former deepspeech) does a really good job!

Another option, if you are a bit more open with "sharing" data and you don't want to miss out the best TTS, is enabling google or azure cloud services - with the big difference, Alice will only send the sound right after detecting your hotword (only while flashing her LED and asking you "yes?" before the recording starts). Nothing else would be shared.


Does it have an option to cache the whole "Hey Alice, what's gonna be the weather at 11 tomorrow?" utterance then if the wake word is detected send what's been cached after it?

That's the main gripe I had with Mycroft, as there was no going around this:

"Hey Mycroft."

long pause

"What's gonna be the weather at 11 tomorrow?"

even longer pause

<answer>

Which is frankly so unnatural and just too annoying to be practical. And it feels like something that should be very straightforward to implement in terms of logic.


For the moment, no, nothing is cached until the hotword is recognized. We thought about it though, but it would mean we have to store passed sound input for a few seconds. While this won't be a problem for the main device, satellites aren't power full enough to run ASR them selfes (Raspi Zero), so the sound is streamed to the main device after the hotword detection. This process wouldn't match perfectly with the storing of the data.

Another thing to keep in mind is, we use intermediate results for the ASR. Means already while you are speaking, the input is parsed. Only a few ms after you go silent, the parsing is finalized and NLU/TTS will start right away.

Of course with a bit bias, I'd say it is more like: "Hey Alice" "Yes?.." "What's gonna be the weather at 11 tomorrow?" short pause (.2 seconds?) <answer>


This is entirely true and I have a few solution I could deploy with some work, the problem is a caching like this consumes power, as you literally listen all the time, as for a wakeword, to cache the audio data in memory and use it ONLY if a wakeword is detected. Now big companies do it in the cloud, we could do it locally, as an option. The path I chose to mitigate that unatural feeling is to use a human answer, bit like at home, you in the kitchen, wife or kids further away, not communicating. At some point you'd call your wife "Alice?" and you'd wait for her to reply for a "yes?" before talking as you are unaware if she's focused on you at the moment or playing with the kids whatever


I haven't looked into the details, but when listening to a wakeword, surely it has to literally listen all the time anyway?

I mean, would it really consume that much extra power to just have a second sink that's just a N-second circular buffer, so you got the samples after the wakeword ready for speech recognition when the wakeword is detected?


Yeah, that's what I said, "as for the wakewords" we listen all the time, looking for a specific wave pattern in the audio and not for words. But the audio is literally always flowing in, on all your satellites and the main unit. The problem with prewarming is that more than analysing a wave pattern in the audio stream, we need to keep a much longer audio data dump in memory in some kind of a FIFO pool. don't take me wrong, it's easily doable, just haven't taken the time to do something polished and not overconsuming for the device running it. Technically, we just need to pool the audio data, say 3-5 seconds depending on hardware used (pi 3 is slow), trim the begining of the length of the wakeword detected length and append the rest of the incoming data while already streaming to ASR, be it local oor cloud based


My guess would be no. After all if you're going through the trouble of setting up a home assistant it'll be mains powered anyway, and the Pis don't actually use that much more power when at max load than when turned off.

I think the ballpark figures for the Pi 4 are 0.5 A when doing nothing, 1A when doing something intensive a single core and 1.2A when at full multicore load.



They both seem to use Kaldi which I have tried before.

edit: Nope Vosk is doing much better than I expected. Thanks a lot for the recommendation!


Glad you came back to add the edit, I tried Vosk recently and was also very surprised. Restored some hope for these systems that I wasn't anticipating, with some work I felt a nice hands-free dev environment is in reach.


Yes, but at the time I did it wasn't really usable. I shall have a look at it again, thank you


Hi all! I'm the main dev, although, @philipp2310 is mostly stearing the boat at the moment. Thank you for all the comments and interest in our Project


Hmm,

Any relation between this Alice project and an earlier Alice AI project, alicebot:

https://en.wikipedia.org/wiki/Artificial_Linguistic_Internet...


Not at all


Why do virtual assistants seem to always have female names?


In the 90s they had male(?) names like Clippy, Bonzi Buddy and Microsoft Bob. Now we still have Bixby, which sort-of sounds like an English butler.


Not exactly the same thing, but we had Jeeves also


dev here (quite overwhelmed by all the new visitors coming from here, thank you all!): My wife doesn't like female assistants either, for us Alice got a male voice and we just pretend it is Alice Cooper ;)


The local classic rock radio station in Boston used to have Alice Cooper hosting a certain night each week (I think Friday or Saturday, and presumably pre-recorded and remotely. Maybe he still does it? I haven't lived there for a while now). He was pretty funny! I'd honestly love to have a voice assistant that sounds like him; maybe the way to deal with both the question about what gender the assistant's voice should be and not having to worry about having generated pronunciations sound natural is just to pay some celebrity a bunch of money (and/or some royalties for the assistant) to spend a month in a studio recording a huge amount of stuff with varied pitch and tone and vocabulary and then that to splice together the assistant's responses. Bonus is bringing them back after a year or so to record more stuff to smooth over any rough edges that are found!


I always wondered if Alice Cooper is doing that for multiple radio stations :) A local German radio station (Radio Bob) has a weekly show with him as well. I guess the same prerecorded stuff with a few custom "you are listening to..." mixed in!

Regarding "just to pay some celebrity".. well, that is currently way out of our budget, as everything offered so far is 100% free and there are no plans to change that :)


> I always wondered if Alice Cooper is doing that for multiple radio stations :) A local German radio station (Radio Bob) has a weekly show with him as well. I guess the same prerecorded stuff with a few custom "you are listening to..." mixed in!

That's super cool! It's probably all the same across the stations, yeah, although it still is quite fun to hear the songs he picks and why. One of the things he would always do was have a pair of songs that shared some theme that he would share afterwards, and it was fun to try to guess. The only two I remember offhand were some song called "Jack the Ripper" and then "Midnight Rambler" by the Rolling Stones with the theme "about serial killers", and then "God Gave Rock and Roll To You" by Argent along with "Since You Been Gone" by Rainbow, which were apparently written by the same guitarist.

> Regarding "just to pay some celebrity".. well, that is currently way out of our budget, as everything offered so far is 100% free and there are no plans to change that :)

Oh, definitely, I wasn't offering this as a completely serious suggestion so much as something that I could totally see one of the major tech companies doing to try to separate themself from the pack


Hijacking since you're dev for a straight answer. There's mentions of multiple voice choices. Is one of them a sampling of the red queen in RE movies? I loved her dialect/voice.


> I loved her dialect/voice.

Ah yes. If ever I want an AI assistant to inform me of my imminent death, I want it to be using that voice.

"Hey Alice, call an ambulance."

"You are going to die."

"Thanks Alice."

https://www.youtube.com/watch?v=wSmYSZGMZj0


sadly no. CoquiTTS integration was in the making, where you have some voice cloning features, but I had to delay it, as it required some newer versions(python 3.9) and I think it was 64bit only.

Currently you got the following choice: Pico, Mycroft, Google Standard/Wavenet, Amazon, IBM Watson

And while using one of the cloud variants might cause security concerns, keep in mind, they will only know what to speak, not why (e.g. what your input/request was)


I tried, long ago, using ssml to mod google wavenet, but got nothing concluant


I always assumed it stemmed from the telephone operators being primarily female, which lead to a large corpus of audio recordings, which lead to the training data for the first decent text-to-speech models being mostly women voices.


It's also because the higher pitched voices are easier to understand when there is noise.

I believe the military did some research on this.


I did read about some military research along these lines. I don't recall noise resistance being mentioned, but that the young male air force pilots responded more quickly and accurately to female voices.


For this project, the readme has a section about the name. tl;dr it's name is from resident evil.


In that game/movie world, it is the name of an AI.

Maybe some passdown influence from Star Trek's ship computer, which had a female voice?


> In that game/movie world, it is the name of an AI.

Alice is the name of the main character. The supercomputer/AI was called Red Queen.

Though it gets a bit muddled, not least because the Red Queen was created using the template of one Alicia Marcus, of whom Alice herself is a clone.


Thank you for the correct analysis of my thoughts at the moment I started Project Alice :-) Often people will think that Alice and the Red Queen is an easy mix and a wrong shortcut to Resident Evil. But indeed, the Red Queen, in the MOVIES, is at the image of Alicia. When RE last part came out, I wanted to include the white queen in Project Alice' mood, but that's not done yet


To note, names tend to follow the male/unisex -> female shift pattern, so any name chosen now can still end up sounding “female” at any point in time.

Except if it’s a name with existing genres variation (like Mario/Maria or Jean/Jeanne) I feel it’s a lost proposition to put too much imaginary in a given name.


Alice / Алиса is the one from Yandex

https://yandex.com/support/m-yandex/alice/talking.html


Yandex is at least a year younger than Project Alice


yeah, was quite shocked when I learned about yandex' Alice..

To be fair: Project Alice is older than yandex Alice and there is nothing but the name connecting these two assistants!


I was already thinking about selling my sould (getting internet connected voice assistant). So I am super glad that there is an offline alternative.

I didn't even know it is possible, but now I feel kind of obligated to put those $30 I would pay otherwise towards this project instead


I finally caved and got an Echo a few months ago. After a few weeks of my kids asking it to tell jokes it's now an alarm clock that tells me the weather once or twice a day.


Aside from home automation related requests, ours is basically voice controlled alarms, timers, and conversion calculators. In all of those cases they are superior to alternatives imo.


These are so basic operations that it cannot be true that an online Alexa or Google Home assistant is better than alts.... If you talk about getting informations, routes and others, maybe, but for a simple alarm or a timer, who would require online connectivity....


> but for a simple alarm or a timer, who would require online connectivity....

It's about the experience and ease of use. Sure I can set timers on my phone all day long or lookup conversions but the experience sucks compared to a voice interface. Open solutions like Alice, Mycroft, and Ada all kinda suck unless you leverage the same online services powering Google, Siri, and Alexa.

We have Amazon Echos and these are some of the situations where a voice assistant just works better than anything else:

- You're standing at the kitchen counter multiplying a recipe. It is easier and faster to say "Alexa, how many tablespoons is 9 teaspoons?" than almost anything else.

- You're making Thanksgiving dinner. "Alexa, set a mashed potato timer for 20 minutes. How much time is left on the Turkey Basting Timer?"

- The dentist recommends brushing your teeth for 2 minutes, I setup a routine so when my kiddo is brushing their teeth they can say "Alexa, I'm brushing my teeth." And Alexa will say "Starting the tooth timer" then make a silly noise every 30 seconds to indicate they need to switch brushing top inside to top outside to bottom inside to bottom outside.

- I'm wood working in the garage and need to multiply a dimension. You can't beat screaming "Alexa, what's 24 times 9/16ths?" and getting your answer without having to lookup from what you're doing, find your phone, or do the math in your head.

- You're putting the kiddo to bed and got home late from practice. So you both decide to let them sleep in a little. It's easier to say "Alexa, cancel the 6:30 alarm and set an alarm for 7 am." and Alexa might ask "Do you want to cancel all alarms or just tomorrow's" and you say "Just tomorrow's". No fiddling with the Alarm clock or remembering to adjust it the next evening.


Project Alice can do all that, other non spying assistants too, offline, we have conversion skills, and if not, it's really simple to create, still doesn't require online connectivity


A little bit of an misunderstanding. When I originally said alternatives, I was referring to alternatives to voice assistants like Phone Apps, Google Searches, math in your Head.

That being said, from watching demos of alternative voice assistants on Youtube and going down the rabbit hole of actually setting up Mycroft, Ada, and Project Alice; yes the experience is better with Alexa, Siri, and Google Assistant hands down. The setup experience is light years ahead of these projects and the performance/latency is also significantly faster.

That doesn't mean I don't fully support what you're doing. I'm just looking at the current state objectively.


It also spies on you.


Just to improve customer accommodation of course.


Fun fact: Yandex, Russian search engine, ecom and taxi/delivery service has a voice assistant named "Alisa", which is Russian pronounce of Alice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: