> Amid this news, a former Alexa colleague messaged me: “You’d think voice assistants would have been our forte at Alexa.”
I assume the goal of Alexa was never to be the top conversational system on the planet, it was to sell more stuff on Amazon. Apple's approach to making a friendly and helpful chat assistant helps keep people inside their ecosystem, but it's not clear how any skill beyond "Alexa, buy more soap" was going to contribute meaningfully to Alexa's success as a product from Amazon's perspective. I saw the part about them having a "how good at conversation is it" metric, but that cannot be the metric that leadership actually cared about, it was always going to be "how much stuff did we sell off Alexa". In other words, Amazon did not ever appear to be in the race to make the best voice assistant, and I'm not sure why they would want to be.
After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.
There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"
On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.
The insane lack of basic heuristics in every day apps to do very obvious things like you mentioned baffles me. They can come up with huge scale fuzzy vector search AI suggestion systems for a billion users, but can't think to do stuff like, only suggest things available in your size?
I'm actually working on an app that solves this for a specific use case, tho it isn't in the retail space.
Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??
My question is always if it’s just the company I’m at or is it how the who industry is run? The more companies I work at the more i realize it’s the later.
On the flip side, it’s easy to take for granted what DOES work when you know how much better it could be. I was siting at dinner with an 73 year old man yesterday who could stop talking about how amazing Siri is cause it’ll tell him the population of some country.
This only goes so far, trying to use your own head as a simulation or approximation of user experience. Some of us will be building software for people who we will never be in our lifetime.
They're hard in different ways (and ML helped with voice recognition to a degree that PhD linguists struggled to do for years.
But to your example. OK. Set and create probably mean the same thing in the context of a reminder. Probably add and a few other things too. Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.
> Should this go on some running ToDo list app I use? Should it ask me for a due date? Should it go on my calendar? And that's a very simple example.
Yes. If it isn't obvious from the context, it should ask.
What it should not do, is demand you to issue all your commands in format of "${brand 1}, do ${something} with ${brand 2} in ${brand 3}". That's what makes current voice assistants a cringe.
> Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??
They hardly even managed the hard part. What's surprising for me is that for a year now, ChatGPT app has been miles ahead of SOTA in voice assistants in terms of speech-to-text with whatever the thing is they're using, and somehow none of the voice assistants managed to improve. OpenAI could blow them all out of the water today, if they delegated a couple of engineers to spend a week integrating their app deeper into Android intent system - and 90% of that wouldn't be because GPT-4, but because of speech-to-text model that doesn't suck donkey balls.
> somehow none of the voice assistants managed to improve.
No one has been working on the old generation of assistants for years now. They all basically came to the conclusion that the architecture that everyone had settled on was a dead end and wouldn't get any better, so they directed their attention elsewhere.
Now Google is working on it again, but just using an LLM for better intent parsing isn't exciting enough to warrant attention, so in classic Google fashion they launched a brand new product (Gemini) that's going to run alongside Assistant for a few years confusing everyone until they yank Assistant (which still will have features that haven't been ported).
Apple seems to be working on improving Siri rather than starting fresh, but it's taken them a while to get it ready because Apple never moves on something fast.
Actually, speech-to-text benefits massively from a good language model. It's impossible to do speech to text if you don't understand the language. The better you understand the language and the context of what is being said, the better you will be at speech-to-text. So it's no surprise whatsoever to anyone that the best-in-class language model would have the best in class speech-to-text.
I think a lot of people underestimate how disconnected simple sound patterns are from human speech. It's hard if not impossible to even recognize word boundaries on a phonogram of regular human speech, even for highly eloquent speakers in formal settings. And many sounds are entirely ambiguous, people rarely understand the exact phonemes they use in practice. For example, most native English speakers pronounce the "peech" part of "speech" more like "beach" than like "peach", if you look at a phonogram [0]. Phonetics is really complicated, and varies far more between languages than people tend to assume.
I laughed at this line: "Today Alexa has sold 500M+ devices, which is a mind-boggling user data moat." Yes, recordings of 500 million people saying "set a timer for ten minutes," and "order more paper towels" over and over again, truly a treasure-trove.
I still love Alexa though, or more accurately, voice assistants in the home. It's just brutally out of date speech recognition and answering.
Always been a mystery why they didn't make it any better in the last 5yrs.
I've long wondered what the Alexa team does all day, but I've had friends who worked at Amazon as devs and I'm pretty sure it's "find as many ways to do nothing as possible".
What they did in the last 5 years was deprecate to death the Alexa web interface as well as totally ruin the Alexa mobile app such that there was really no great way to administrate my condo full of Echos.
What if I were to tell you that the Echo team spent all their time building not one, not two, but three entirely separate embedded tech stacks, for which all features had to be ported over, that all had to be supported because there's 25+ different devices because people got promoted for shipping 4 different ones every year?
I always wondered (being a developer myself) how a team at a well financed FAANG could fuck this up so badly. It seemed to me as if they were intentionally trying to ruin the product. Changes for no benefit month to month and this month doesn't work like it did last month, etc. Every night all 6 devices get reset to the same broken configuration I just changed yet again that day. What you describe here would do it easily and I'd absolutely believe you. I envisioned there had to be some reason but I'd have never guessed this was it.
YUP - thats my experience as well - consistently too. The ONE thing we all use this kind of limited tech for and the Alexa team couldn't even do that well.
I don't even really use it for that. I have a kitchen timer on my microwave and another one on the oven. Both are easy to set. I can see if they're running and how much time is left.
I'd guess that half our Alexa utterances are for setting timers (the other half being split between listening to music and summoning the family to dinner/other announcements). Having dirty hands or not having to walk over to frob buttons makes the voice-activated timers pretty compelling to me.
Both my kitchen timers are sealed panels, one above the stove and one on the oven so I guess I just got in the habit of using them years before I had an Alexa.
I don't really even use the small one in the kitchen except to ask for a weather report now and then. But I see how the timers can be useful. They're very handy for my dentist etc. and would be for timing anything when you really don't have a free hand or want to touch anything.
Bedroom is light and alarm. Sometimes music but I don't really listen to music in bed. Downstairs I play off the AppleTV connected to my stereo system.
Set timer, play music, turn lights on/off, get weather report accounts for 100% of my Alexa usage. My wife uses the summoning function occasionally and my kids used to ask it random questions when they were younger, but yeah, that's about it.
For a while they were pretty much giving away the little Alexa dots. Buy some soap, get a free Alexa. I remember being unable to give them away and I think we ended up just trashing about half a dozen of them (vs the five active Alexa unites my ex-wife has. I personally have none in my home and plan to keep it that way).
This is the core tension at the heart of Alexa that the author didn't address at all. It's not that Alexa is a bad product or that it wasn't cutting-edge, it's that it contributed very little to Amazon's other lines of businesses to justify the investment.
"Shopping with your voice" never took off despite many attempts. The contribution towards subscription services like Audible and Amazon Music was not substantial enough to warrant the massive R&D investment. The business unit never found any other sources of convincing revenue.
Every other decision is downstream from that unresolved tension.
I've never used our Alexa for shopping. If I said something like "Alexa, buy more filters", even being very clever and looking at my order history, it would still get something wrong. And then I'd need to use another device to actually make the order.
While it seems to work fine on the speech recognition part, in that Alexa understands the words I say, it never seemed good enough to actually navigate a task like ordering the right kind of filter.
I knew there was some behind-the-scenes scripting going on, but I didn't realize just how much...
We mostly use our Alexa for kitchen timers, reminders, and video calls with family. Occasionally for playing music too. No, I don't want to subscribe to Amazon Music Unlimited.
> "Shopping with your voice" never took off despite many attempts.
We're seeing this more and more in tech: Company comes out with a feature that few people want. It doesn't gain adoption. They make many attempts to cajole and nudge users to use the feature. Users don't use the feature. They make more buttons and flows trigger the feature. Users ignore them. They start tricking users into using the feature, with dark patterns and misleading buttons. Users deliberately learn and avoid these. Exasperated, they declare "Why, oh why, won't users just use this feature!? They're just uninformed or don't know what's good for them!"
Whatever happened to starting with what the user actually wants and then working backwards from that to the actual feature? More and more, companies are more interested in serving their own metrics than serving their users.
Lots of companies still work like you said. With a company like Amazon, though, you have two things at play.
The first is that it takes a massive amount of dollars to make an impact on revenue. You're just not going to move the needle by selling a new product or something. It's a mature business and to meaningfully increase your revenues you need to alter people's current behavior so they spend more.
But, the second thing, is that if you manage to increase revenue by a percentage point, that is a huge amount of money, and that potential payoff can justify a huge investment. And once you've made that investment, there's a lot of incentive to try to make it work (between sunk cost fallacy and potential payoff).
Lots and lots and lots of companies fail because they build something that doesn't solve a need. It's like the number one thing you learn. This is nothing new. It's just that in the age of hyperscaled tech companies, the payoff for unlocking a new market or changing user behavior is huge, so you end up with lots of attempts to develop some technology and then figure out how to use it to change the world.
> The previous method was to use focus groups to make new products. That was far worse.
I actually don't think that was worse at all, and focus groups should come back into the equation. Not as the only measure, of course, but they were never the only measure.
The problem with focus groups is 1) people say they want different things than they actually do, and 2) it's rarely representative, you'd need thousands of people. Then, once scaled, you lose details because it has to be more survey-like. Add in the leadership team's "vision" and you have a recipe for only making products for e.g. white men and ignoring the black woman haircare market.
It all has a place - but actual real data is probably the best way to figure out product market fit in my experience.
they want you to shop with your voice because there is a lot of friction if you then decide that's not what you really wanted. most people if they mistakenly order something with alexa will just let it be delivered, not cancel it.
same with using a mobile app to shop. you have less ability to cross shop quickly because the interface is inherently slower than a mouse and keyboard plus the OS multi tasking features are horrible. plus they get a lot more information about you with their app.
so it's not always that they think the feature is better for you, it may just be that it's better for them.
I used Alexa for audible a few times and it was a nightmare. It wouldn't start where I stopped listening on my phone, and then when I stopped listening with the alexa it wouldn't record the correct timeslot back to my phone. So I either had to _only_ listen with alexa, or listen with everything else on my phone.
Shopping for household goods on Amazon is a minefield to begin with. Prices vary wildly from day to day, even on identical (same ASIN) products. And descriptions tend to be vague (is it Mediterranean oregano, or is it Mexican oregano? They're not the same plant, but it doesn't fucking say!). Per-unit pricing is often broken (is it 24 units of single cans, or is it 4 units of 6 cans each?).
Even when sitting in front of a real computer, it often takes fair amount of effort to find a product that represents the kind of value at the moment that I'm interested in.
Comparative shopping with this mess on the back end doesn't work with the current state of Alexa. There's details that are important to me, as a consumer, that can't be boiled down to a price and an 8-word summary.
If the back-end data weren't broken, buying with Alexa could be made to work if it could get a grasp (using ML or some other buzzword) of how a buyer's proclivities tended to be shaped. For instance, some people want the best per-volume price, and some others want the highest quality at any expense, with a huge range in between. I myself don't have a ton of room for bulk buying, so I often aim for a medium volume of moderate-high quality, tempered by a price that is low today.
But, again: The back-end data is broken, and Alexa is too stupid to make what I think are good decisions. When I can't trust the talking computer on my countertop to make good decisions for me, and if my hands are already full, I don't have time to have a drawn-out conversation with a bot, so I won't ever actually buy stuff that way.
It's not functionally better than Amazon's abortive Dash Buttons[0] from 8-ish years ago, which were also untrustworthy for many of the same (or related) reasons.
---
But if I'm cooking in the kitchen and I notice that I'm low on oregano, I do have time to say "Alexa, add oregano to my cart." And I'll also invariably make time to interrupt its misguided response with a quick "Alexa, shut the fuck up" once it starts prattling on about the useless summary from the bad back-end data (GIGO), so I can get back to doing what I'm doing.
This is important to mention because if I weren't already busy with my hands, I wouldn't bother with using Alexa at all for this task.
Eventually, I'll find myself in front of a real computer again and I'll go through and true up the things I've used Alexa to put in my cart, so they match my actual expectations, and actually buy some things. And while this is useful to me, it's obviously pretty far removed from the target goal of the system.
And it can't ever get better until they fix their data.
100% agree. I don't want to converse with Alexa. I think the miss was not thinking about it as a media delivery platform with some utility. I use Alexa in the Kitchen and Bathroom when I'm doing other things. Hands free timers while cooking is great. But mostly it is listening to music. But they should have tried to be a #1 podcast source from day 1. I'd also like local weather and news from a local source. They could have made that another focus. Audio books and a lot of the other Amazon controlled content translates well to an Alexa. I think they focused too much on the smart house side and not enough on hands-free content consumption.
For sales they gave up a good brand, what, 8? years ago when they stopped caring if what they were selling was even a real product and started taking part in the sale of products they know to be fake and/or rip-offs.
They cornered so many markets and, surprise, used that position to let every go to shit for a profit. Still at least Bezos got to wave his wang at the World by going to space.
AMZN execs are watching AAPL stock go bonkers today on yesterdays AI announcement. APPLE IS DOING AI AND WE ALL NEED NEW PHONES TO USE IT!
I expect a similar thing to happen when AMZN announces some AI consumer product. Never mind they were in a Prime (ahahah - get it - "PRIME") position to be the first mover here.
Is Alexa really that primed to be useful? For home devices, maybe. For mobile? The Goog's seems like it would be better primed with its Android devices.
What I saw from Apple this week shows me that they've been much more focused on making the assistant useful for anything/everything someone could do on their devices. I have not see that kind of focus from anyone else.
I'll argue that business is a lot faster paced in the "age of enshitification" than it was in the past so that today a company can decline as much in 4 years as would have previously taken 40.
4 years ago I bought something from AMZN roughly weekly, today I buy something from them every few months. They'll put up a banner that says "You paid $30 for shipping in the last year" and I'm like, yeah, you want a lot more than that for Prime. They've got the data to show that, at best, I watched about $30 worth of Blu-Ray discs worth of content on Prime Video per year. Add it up and Prime makes no sense for me.
The fact that they have a highly profitable AWS business makes it worse instead of better since they can maintain the perception of normality, even growth, and not have to pay attention to the rest of their business.
Their forced insertion of ads into their streaming content was it for me. There are so many shows that do not say ad supported and available with Prime, yet as soon as I'm watching something that shows me an unexpected ad break, I stop it right there. I doubt there's anyone looking that the specific metric of how many shows stopped being viewed at the ad break, but I can't imagine I'm the only one.
I always suspected YouTube pays attention at ad breakes because every time they would dare starting a video with 40/50 seconds of unskippable ads (happens on the tv app only, they never dared on desktop/laptop) I just exit the video, reopen it (or another), and I'm served for a goos hour 15 to 25 second ad max.
And yes, I went for Premium which I find outrageously expensive.
Agree. I feel they are getting high on their own supply.
Management bullshit that is usually fed to 2nd tier companies by giving examples of Amazon on how the best in industry operates is now actually believed by Amazon itself.
Absolutely! I’ve commented about it a few times on HN[1] as I was at Amazon at the beginning of Alexa of investment.
As with other projects Amazon’s plan seems to have been get big fast and figure out monetizing later. I’m sure ZIRP played some role in it and if not for rate hikes they might have kept it going for few more years.
But their aim from day 1 was to get millions of devices into customers home and then use that to boost e-commerce sales. When the second part didn’t materialize the initiative suddenly became a white elephant as it costs non trivial server capacity to keep the backend infrastructure running.
I feel like you are correct while thinking Apple's stance is the smarter play. My family ditched the entire Amazon ecosystem because Alexa was so utterly useless. Keeping people generally happy and entertained creates a positive mentality on the entire company as a whole. Let me tell you, having the kids on board is huge and prob a missed opportunity. You could literally be training kids to be little Amazon shills if this was done right. Mine loved talking to Alexa and then she got less and less intelligent and we swapped to Google's products -- same story there. Now it is pretty much a weather-bot and that is all.
Is that actually true? I cannot imagine that they are even marginally successful at that. In fact, I can’t identify what exactly Alexa succeeded at, beyond being a voice activated kitchen timer.
> that cannot be the metric that leadership actually cared about
I think the metric was promotions for Alexa employees, sort of like a lot of projects at Google.
It's a networked, voice-activated kitchen kitchen timer, but it's a shitty one.
Suppose I put a roast in the oven and retire to my office to do something completely unrelated to cooking, where I cannot hear what happens in the kitchen.
One would think that I could set a timer in the kitchen and have it notify me wherever I am -- in the office, in the living room, on my pocket computer, on my desktop PC, or maybe even all of these things.
"Alexa, set a timer for two hours and notify me everywhere" seems like a perfectly cromulent thing to do.
But it isn't that way. Timers follow Vegas rules: Timers that that start in the kitchen stay in the kitchen -- they cannot be heard anywhere else.
It's not superior in any functional way to the old dumb digital timer on my oven, which has a VFD and a rotary encoder to set a timer.
(Which, by the way, has really marvelous ramps and responsiveness for that encoder -- it's silly-fast and efficient to give that knob a twist and dial in exactly what I want for a timer. Adjusting the clock for DST or whatever is equally fast and straightforward.
Except, fucking perplexingly: Alexa can notify me in the office when my oven timer beeps in the kitchen. This works fine.
All that is clear is that there is nobody steering this fucking ship.)
> Which, by the way, has really marvelous ramps and responsiveness for that encoder -- it's silly-fast and efficient to give that knob a twist and dial in exactly what I want for a timer.
Omg I am so jealous. All the appliances I use at home or family have horrendous UIs.
And I am really trying hard to find appliances with passable UIs. It's impossible those days.
And for the few that have a rotary encoder, they are barely usable, and used for anything but selecting the time.
I have contacted companies and offered to write the code for free to fix theirs shitty knobs. But they all refused or didn't bother replying.
It's old -- I'd guess from the mid 80s. Branded Kenmore and made by Frigidaire/Electrolux. It was free. The controls themselves were basic electromechanical things -- the clock/timer doesn't do anything but display time/timer and beep.
When I first got it, I assumed it was driven by AC as many clock ICs once were back in those simpler times.
But then I discovered a small low voltage DC power supply when I was in there installing a PID controller for the oven, so it may actually have something resembling software. Maybe.
And yeah, unresponsive physical controls are a pox. I'm not sure what method you used, but perhaps contacting the companies that design and build the boards might be better than contacting the companies that buy the boards and install them.
Alexa had a whole platform push giving away clothing if you made an app ( I have a nice hoodie from it) The thing they lacked was a way to get payments done through it. This is similar to Vine at Twitter where the core problem was monetization.
I am literally laughing out loud, because Amazon had decided to fire majority of the Amazon Pay team, which is why there was no longer a clear way to handle payments.
I turned off every ad setting they had and it would often end an answer with "by the way, did you know..." and then some Amazon promotion. I asked if it's raining, you don't need to tell me about an increase in Amazon Photo storage sizes. Especially given that it takes longer than answering my original question.
I got frustrated with that and tossed all my Alexas.
One thing really worth addressing from the post that I don't think author accepted, and I see this a lot with engineers:
> "That did introduce tension for our team because we were supposed to be taking experimental bets for the platform’s future. These bets couldn’t be baked into product without hacks or shortcuts in the typical quarter as was the expectation."
If I can pump one learning into engineers' and PMs' heads it's this: intermediate deliverables are not optional no matter how cutting-edge your team is.
You will never succeed if your pitch to leadership is "give us a budget for the next N years and expect no shippable products until the end of N years". Even if you get approved somehow at the beginning, there's a 99.5% chance your team/project will be killed before you get to N years.
Again, once again for the audience in the back: there is no such thing as a multi-year project without convincing, meaningful intermediate deliverables.
To clarify, that doesn't mean "don't have multi-year roadmaps", it means "your multi-year roadmaps must deliver wins at a consistent cadence".
Understanding this will carry you a lot further in the industry.
As a fairly cutting-edge R&D team part of your job is to figure out what slice of this is shippable (and worth shipping). If you're coming up empty you are not ready to pitch this to execs.
As an engineer it's much easier to "polish" (or work on useless stuff) than to deliver real products. I see that all the time, specially with "platform" teams that have no external product requirements. They waste a lot of time trying to figure out the most amazing way to deliver something instead of delivering something in a timely fashion.
If you push in any way they start to scream "tech debt" and everyone just accepts it. I've been through a migration mandated by an infrastructure team where where were 0 improvements for the teams that used the platform, all benefits were for the platform team only, and this was green-lighted and forced upon everyone without a second thought. It's unbelievable.
Its also impossible, even the "amazing" solution will be found to have holes once it hits the real world, the sooner you can get some real feedback for what you're building the easier it is to course correct and move towards the real user needs.
> To clarify, that doesn't mean "don't have multi-year roadmaps", it means "your multi-year roadmaps must deliver wins at a consistent cadence".
What you describe is exactly the opposite of research: which is collecting neverendin failures.
An environment that lives by such logic cannot really lead to major technological breakthroughts. And in fact, Amazon has very little of those to show compared to the rest of the SV.
Not all places are doing research. In fact most are not! Knowing whether you're at a company willing to bet money on R&D, or whether you're at a company that wants you to come up with actionable, is pretty crucial. As you said, Amazon is not really out there doing research - so engineers working there would do well to assume they need deliverables to keep managers happy.
I agreed but in the case of the news we're commenting this feels an awful lot like an effort that would've benefitted from a proper R&D team where only the leadership would've had to be involved in company politics, goal-setting, etc.
I think what you say is true, yet some base research takes time. If you have a “ship it” attitude, you might push teams towards taking smaller bets that they know are within reach? I don’t know how the transformer breakthroughs at Google/DeepMind happened, and it’s likely they were “shipping” things internally, but it seems clear that the people on those teams were working in a very different environment than what the author is describing.
If you look at all the defining products of Apple, they also took years from the “germ of an idea” until they could be launched, and though they might have “shipped” internally, they gained a lot by not having pressure to ship things piecemeal to customers.
Google Brain spent oodles of money developing that tech only to watch other people capitalize on the research and potentially make Google Search (one of the stickiest products I've ever seen) obsolete. Freeform, self-directed, open-ended research labs are certainly a great approach from a technology breakthrough PoV, if you have 2010s Google margins.
But it's not obvious to me that approach was even a net win for Google as a business. Did Google Brain invent the technology that killed Google? TBD I think.
I wonder what would be the alternative, though? Other companies/universities would eventually have made the same breakthroughs, and I don’t think the answer to the innovators dilemma is to do less ambitious innovation?
In the case of Google there’s a lot of internal reasons why they didn’t leverage this opportunity, but if they had done that they might have ended up making their main product even more sticky.
This _highly_ depends on your field and what you're working on.
Working on the latest and greatest social media website? Sure, ship early, ship often.
Working on medical devices? You better not ship a prototype.
Working on hardware? Too expensive to pivot from learnings, better get it right the first time.
Working for NASA? You better get it right the first time and predict all future issues that might be possible, and you better document it 9 ways to sunday.
The parent's point though is it actually doesn't. For example with NASA, there are 1000s if not 100s of 1000s of intermediate goals on a project like SLS. For example, IDK, make sure the engine hits 95% of thrust spec. Hardware? You design each IP in isolation, test in isolation, build up step by step. A0 tapeout, A1 tapeout...etc.
It's a catch 22 with ML though. What you wrote is completely true however with ML you cannot say "We will get to 98% precision and 92% recall by Q4". You do not know how long it will take (see self-driving cars). However of course you can always lie.
> "It's a catch 22 with ML though. What you wrote is completely true however with ML you cannot say "We will get to 98% precision and 92% recall by Q4"."
This applies also to ML - it applies to all tech projects, though yeah, it's harder in ML. But not figuring out the intermediate products is not an option though - your stuff will get killed prematurely if you don't.
The trick with ML is not to promise "98% precision and 92% recall by Q4", it's to figure out what kind of product is shippable with lower precision and recall. Or perhaps a stepping-stone model that allows some simpler use case, but gives you progress towards the greater goal.
It's always case-specific, but as a ML team you do need to figure out what your intermediate checkpoints are. You need to demonstrate not only progress, but that your progress is contributing to the company's goals.
I think of it as building a staircase. If we need to hit 98% eventually, we need to make certain progress now. It is possible that a new breakthrough could appear in 2 years to spike that, but the safer assumption is that our near term progress will harvest the low hanging fruit.
Therefore, intermediate and measurable milestones are to derisk something. Even if you are doing a moonshot, there are still steps you can identify. The namesake, Apollo, didn’t try for the moon on the first launch!
Exactly this. Intermediate shippables are derisking. There are a few things that this strategy represents to executives:
- "You have already received benefit X for your investment in our team, and X has been well worth it. Continued investment in us will yield benefit Y."
- "Even if the project is terminated early, or later stages become blocked due to business or technical impracticalities, the company will have gained a tangible benefit already in the form of X. There is a payout, even if it is not the full thing we wanted."
- "The intermediate products validate the technology/product/business model, such that every incremental deliverable means an overall reduction in risk to the final goal."
100%. And the sooner low level folks like me can provide this signal, the sooner the execs can make decisions. That saves cycle time and is the real secret sauce for growth.
That’s why my favorite interview question (I’ve done hundreds of interviews) is “tell me about a time you cut corners on a project”. My role as a data person is to provide enough signal as early as possible - not necessarily to give a precise answer to everything we want to know.
Right but the "how" is the tricky part. For example, wrt to ML, what is the milestone and how do you know you will hit it by a particular date? Very difficult to say with ML.
This is where smart analytics teams are super useful. It is a really hard projection estimation! Even understanding what the F1 score should be to be "good enough" is nearly impossible, much less understanding what it takes to go from "where we are" to "good enough".
What you can do is estimate something like "we think we need $measurement_value on $metric by $date, we are at $current_value. If we're not at 75% of the gap in 50% of the project time (when we have low hanging fruit), we cancel."
Agreed, it's very hard, I'd like to see how many orgs have actually done this successfully and what those shippable intermediate products actually were.
1000 times this - you need to deliver something. Even very long projects such as medical research have intermediate stages and goals to meet.
Very experienced people tend to forget this from time to time too and get excited or convince themselves "big risk big reward"... I've never seen that work out.
That's true, and it justifies why some things are still better done in a university environment (or as part of an academic collaboration) - it's not possible for everything to carve out something useful in the middle.
Executive patience, focus and planning horizon are the immediately next 1-4 quarters, years 2 and 3 and perhaps 5 if you are lucky and that's it, and they might not even be around in five years.
In academia, if you are stubborn and tenured and don't care about your short term success (publications, citations, awards) you can actually decide to implement a very long-term vision, depending on how much additional funding you need (if that is a lot, you will need to also convince funding agencies or philanthropists of your vision).
Big tech isn't able to do certain innovations. It's not that intermediate wins don't matter, but that innovative capabilities don't come out fully formed with a business model. And I don't know of any big techs that effectively do business model and tech innovations at the same time.
If your goal is only promotions inside of big tech then definitely throw most innovative ideas out the window. But if you're interested in innovation, then big tech either needs to get a lot more entrepreneurial or less big.
I agree with this. Said more concisely, have a long term vision and an incremental path to get there. With each step or two, re-evaluate the long term vision. It may change, it may not. If it changes, update your incremental steps, if it doesn't you have more confident you're building the right thing.
IMO voice assistants keep underwhelming because customers don't want to learn their language--we want them to understand our language. It's frustrating to have to know the exact magic spell wording to get an assistant to do something. They need to be like "Computer" in Star Trek TNG or they're dead in the water. 80% of the way there still isn't good enough--I'd rather just use a mouse and keyboard than endure an awkward trial-and-error with an "assistant" that is supposed to be smart.
Although ironically, the biggest complaints I hear about the current crop of assistants is the lack of consistency. People are fine learning a command their assistant can do "turn on the living room lights", and then about 20% of the time, the assistant refuses to perform the action with the exact same command as yesterday and needs some slightly different variant ("turn on the light in the living room"), just that one time. The assistants would actually be BETTER if there was a documented, precise, syntax you could learn.
So you pick up the mouse, and talk into it?
....
Seriously though, I don't understand why the state of voice interaction is so poor.
In the 90's we had voice commands (early dragon ?) available to tell our computer what to do. It was limited, but it worked extremely well, even in busy environments.
I remember my Thinkpad 486dx2(?), at a party - opening software and choosing music to play from a list, and controlling volume, all by voice. Thinking to ourselves, imagine what this will be like when we have a stronger, faster computer, in five years.
It's truly gone nowhere. Still, the most advanced thing you can reliably get it to do is "Set a timer for 10 minutes."
I wonder if these "SmartAssistant" programmers ever actually had a human personal assistant. For most of what you need them to do, you don't even ask them to do it, they just know you and do it. An actually good computerized SmartAssistant would know that it's been a year, so it's time to book my physical with my doctor. It would have contacted the doctor's office for me, checked my calendar, scheduled the appointment, and then proactively reminded me a few days in advance. I shouldn't have to say "Hey, Assistant: Please schedule a physical for Doctor X at Clinic Y on July 1 of this year." (by the way SmartAssistants can't even currently do that).
The voice interaction should only be for exceptional cases: "Hey, Assistant: My trip to the Paris office needs to be delayed by one week." The assistant should then go and re-book flights, hotels, and rental cars, and then when finished, merely say "Done."
Until they can do this, tech companies might as well stop bothering releasing incremental crap products that can barely understand a task I'd expect a 4 year-old to be able to do.
In general, I wouldn't want to pass off that level of control. Maybe if I'm really busy and an assistant knew me really well... And there are certainly sometimes heavily scheduled trips where your "handlers" pretty much just tell you where to be and when.
But, especially if it's my money on a trip I want at least some "me" time, I probably want to take at least a cursory look at flight and hotel options and lots of other details.
That's also a detail a human assistant would know about you, and would know to pass the information on to you for confirmation before they took action. I would expect a SmartAssistant to do the same.
The point is "Alexa play music" is a huge distance away from what the product should be.
I think it was scalability to languages, dialects, idioms, etc. Super easy to have high quality American English with a few commands. Much harder to support any language, any syntax, any accent. The brute force optimizations just don't scale.
Modern ML and embeddings models are the discontinuity that was needed to get from "massively complex hack that can't scale" to "even more complex but principled approach that scales pretty well".
Hmm, I'm an early Alexa PM (2016) that left Alexa before the OP joined it (2019).
Alexa's main failure was mainly that the tech wasn't ready - it was basically a ASR + NLU + rule engine. If we had 2023 LLM tech, then we may have "won" the Assistants market.
Yes, organizational bloat and politics was a problem but OP was hired as a result of the mass hiring spree, so he was a beneficiary of that.
Also ex-Alexa in the early years, while that would have helped, I really don't think it was just the tech stage. As we grew it was very hard to work on the core tech at all since we had a bunch of time spent on organization issues and working with people who 1) didn't get how ML worked or pay attention to how the tech is changing (most of the "domain" teams) and wanted to just ship new integrations or demo features and 2) never found a monetization channel which would have allowed for investment into deeper tech (and less senior leadership/exec churn).
Though I also very much agree with the other point of OP that privacy paranoia also blocked development. The privacy team seemed like they would have been most happy if we couldn't ship.
I wouldn't really call OP a beneficiary of it. As much as I like drawing a paycheck I don't want to work for an organization that's set up to keep me from succeeding. I imagine OP feels that way too.
Honestly I think the only reason you guys shipped as many units as you did was blocking other voice assistants from interacting with Audible. I've had an Alexa/Echo since the early beta and it has literally always been worse than other options like google home. It's gotten progressively worse over time as well. Not just comparatively but vs it's own experience several years ago.
Yep, can confirm that it was better in 2017 than 2021.
Mainly because when the org chart grows, more "rules" are added to the rules engine, where each rule is managed by another service... which all adds to end to end user perceived latency, etc... that's why rule engines don't work.
The hardware is just a microphone, a speaker, a tiny computer, and a WiFi chip. It's easy to keep providing updates for old hardware when all the brains live in the cloud.
They don't seem to have modded Alexa to keep pace with accessibility of the pubic to LLMs though? Presumably they'd have just let that market lead slide in the same way.
They are actually planning to do that - it's just a software update - though they are doing that with Amazon Titan model, which might be kind of behind..
Also they are trying to sell as a subscription which is interesting since Siri is free.
I worked on a research team in Alexa. Almost all projects were focused on short term delivery, non immediate need innovation was highly discouraged. Most models were just direct imports of open source models created by Meta/Google. Extremely short delivery timelines for incremental improvement. Alexa employees were often highly political tenured Amazon employees. Minimal room for growth as they sucked oxygen out of the room.
The Amazon philosophy of constant execution is at odds with large leap technical innovation. It works very well for ops heavy AWS orgs, and supply chain related optimization problems. The company has a cultural problem
Regardless of the above, ChatGPT made almost all NLP technologies across all companies obsolete.
Your order of "Boost Oxygen Pocket Size Natural Aroma 3 Liter Portable Oxygen Canister, Respiratory Support for Aerobic Recovery, Altitude, Performance and Health" is $23.91 and will be delivered tomorrow.
To check the status of your order, use the Alexa app.
My understanding of Alexa is the thing doesn’t support a business of its own so it’s beholden to showing value to the enterprise through soft or derivative metrics like engagement or LTV. I believe that when a venture doesn’t stand on its own value within a larger enterprise it’s doomed to corporate politics and external optics. This has an ultimately shorter shelf life than something that makes money, in some companies it might be greater than others.
- Why pay for Alexa 2.0 when you can get an alternative for free?
- How to get people to do more than the bare minimum of music, notifications, and home automation? People can't even use lists anymore, because it's being deprecated. Same with Routines.
Interesting given how many products out there now are negatively described as ChatGPT wrappers. Sounds like Alexa was just an open source model wrapper.
> Regardless of the above, ChatGPT made almost all NLP technologies across all companies obsolete.
that's overstated, because you accidentally lumped speech recognition in, and i imagine Nuance (https://nuance.com) and others are like "hold my beer".
The first paragraph of the first section of reasons given is:
> Alexa put a huge emphasis on protecting customer data with guardrails in place to prevent leakage and access. Definitely a crucial practice, but one consequence was that the internal infrastructure for developers was agonizingly painful to work with.
I really don't want this to be a message companies are hearing right now -- that being conscientious about customer data is a lethal barrier to progress, in the "AI" gold rush.
Also, without knowing anything about the organization, I'd expect it to probably have a high level of dysfunction, being at a company known for being excessively metrics-driven from the top, and for ruthless stack-ranking and related HR practices... trying to organize a large coherent cutting-edge R&D effort against that cultural backdrop. Like suggested by this bit elsewhere in the section:
> And most importantly, there was no immediate story for the team’s PM to make a promotion case through fixing this issue other than “it’s scientifically the right thing to do and could lead to better models for some other team.” No incentive meant no action taken.
We use both azure ans AWS at my current org. We recently had an internal 'hackathon' to try Llms in both orgs (Claude for AWS, ChatGPT4 for Azure) on our knowledge bases.
Clearly, we couldn't differentiate them on response quality, not in 3 days, but on how easy and integrable the LLM was, AWS was superior, even for our mostly Azure teams, weirdly.
I don't know if that's really true - I've heard people who spin up successful teams can get bonuses (which basically don't exist at Amazon) for doing so. In that sense, incremental improvement isn't really rewarded at all - there's very little to no pay-for-performance as a concept at Amazon.
Yeah, we do yearly comp reviews not with stock grant numbers, but with total comp targets.
So if you do mediocre one year and get a mediocre grant, then kick it into high gear next year, but the stock grew in the meantime, you're told "tough luck" and not rewarded for the extra effort. Or at least stock growth is deducted from your effort reward.
It's absolutely insane, but Amazon is the easiest faang to get into
The truly insane part was that if the stock went down, they just said "too bad that's the risk you took with stock comp". They didn't true up if the stock price went down unless you were lucky enough to get part of the small pool of stock your director got to spread around for top performers.
They basically set aside your stock when you start, and you the employee absorb all the risk of downside and get punished for upside.
> Alexa put a huge emphasis on protecting customer data with guardrails in place to prevent leakage and access. Definitely a crucial practice, but one consequence was that the internal infrastructure for developers was agonizingly painful to work with.
I was in Alexa and this rings painfully true. So many workarounds and endless classification escalations. The customer-data certified compute environments were extremely painful to use (though later improved but still annoying) and getting data in or out, even for anodyne reasons, was nigh impossible. For a long period, even getting access to this system (called Hoverboard) took months. During my internship I spent about half of it waiting for access to be granted and had to spend a big chunk of it testing out my training system on CPU...not fun.
This was a common complaint in Alexa and one interesting bit of insider information I found out from people that either left for Google Assistant or came from there is that their tools did not have this level of pain around data security. As a result I'm guessing there's a middle ground between customer data security and making it impossible to work with - either that, or Google just didn't care.
To be honest, I'm just going off the other commenter's testimony; I would have presumed that their internal data access control policies would be... minimal, to be generous. The fact that developers couldn't get access to data to the point that they noticed and it was painful sounds like a genuine good step in the right direction, though.
There is a point that there should be a way of gatekeeping this in a maximally efficient and accountable way but setting the standard of "Access Control" as being that there's friction to gain access seems like a no brainer.
Amazon invested $4 billion into Anthropic [1], which makes Claude AI, the AI I use almost exclusively. Claude is being trained to be very human-like, safe and friendly and has improved very much in that area lately [2], so I assume the idea is that Claude will be Alexa 2.0. In my experience, even employees maintaining the current version of a product can be out-of-the-loop on the future of the product.
In fact I think you sometimes have to ensure the old guard is gone to let the new guard take over. Google and Apple have been in the news with cuts to their voice assistant teams in recent times too, similar to the Alexa cuts.
It’s obvious current ai can handle context reasonably well which is something the previous voice assistants failed badly at. The next thing is to write all the apis so the ai can reasonably act on that context. It’s such a new way of doing things it’s probably best to hard cut development of the previous way to do it which was seemingly hard coded triggers->actions which always failed badly if you wanted to add context eg. “open house blinds when I pick up the phone in the morning or when the alarm goes off, whichever comes first” would never work with the old voice assistants. It might just work with the new ai systems but it’ll be a completely different system, not even a rewrite at that point.
Pity Google, Microsoft, Meta didn't have the foresight to name their companies in preparation for the AI boom. Adobe, Autodesk, Alibaba, and AirBnB are looking pretty smart.
I suppose this might explain the Google -> Alphabet thing but they haven't really embraced the new corporate name enough for "Alphabet Intelligence" to make any sense.
The infra part particularly resonated with me. Leadership caring about high level metrics deprioritize "in-the-weeds" dev experience. At <redacted applied ML place>, I was working on data infra, but pressure from above made it impossible to have time to focus and instead focused on short-term deliverables. It was hard to convince others to adjust their priorities for "the greater good".
The only way to make things better (in my mind) was to use my own time to improve the infra, and because the metrics don't track these infra improvements I don't get rewarded so I just became burned out.
Part of me think this is the reason why you want bloat in orgs, so that motivated people with enough redundancy will actually feel comfortable chasing longer term incentives.
I think we overstate how useful such a system is. Visual UIs with buttons are so predictable and efficient to use, so anytime you can reach for a screen and hand-operated-input for anything complex, you will.
I've stopped using home devices like the Echo (coz of privacy concerns, esp with hotword mistriggers): now use voice only when driving the car. Maybe multimodal LLMs like GPT-4o will spawn new useful use-cases, but I think they're unlikely to be for the same use-cases Alexa the product+brand is known for.
Exactly - Amazon wasn't the only company gutting their conversational UI teams because they couldn't figure out how to make money with them. My guess is 99% of conversational UIs are some combination of:
1. Set my alarm/timer.
2. "What's the weather?"
3. "Turn on/off my lights" for those with connected lights.
etc.
We've had voice calling for over a century, yet it feels like the majority of us prefer to text most of the time these days.
> We've had voice calling for over a century, yet it feels like the majority of us prefer to text most of the time these days.
It depends what culture you’re from. Many cultures around the world prefer voice. If you live in a fair large city just look around for folks on the phone. There are still many people who need a voice plan.
It also depends greatly on the conversation you plan to have. If I'm hitting up my wife to see what she wants for dinner, that's perfect as a text - it's brief and benefits from being asynchronous. But if I want to talk about something time sensitive (perhaps I'm at the restaurant already so I need an answer now, or maybe an emergency has happened) I call. I don't think people prefer to use text most of the time so much as they use the most appropriate medium for the discussion.
I honestly really think this depends on the age/generation of the participants (or perhaps it's just personality and I'm wrongly ascribing it to generation).
I'm with you 100% (and am later Gen X FWIW). But I have some friends born late 80s who will have these drawn out, intricate, sensitive conversations for literally hours over text and all I can think is "jeezus, just call them on the damn phone!"
The problem with all the original iterations on these assistants is that they fail so often that people stop using them for anything except for the few things that they can do reliably which for most people is basically setting timers, playing music and turning the lights off and on.
They're annoying to use, because the interface sort of implies affordances (like, you know, just talking to it like a person) that aren't actually available, and really it's just a menu tree that's barely more sophisticated than a customer support call tree.
In a situation with such a low margin of value for the user, there's an even finer point on reducing friction. I know Alexa stopped being used in my household shortly after the "by the way…" addendums were introduced. This seemed to be in recognition of the trend of reduction to memorized phrases, but it was a bad approach. If the other comments are true, and incremental progress was more rewarded, it stands to reason that such a major problem with the service was under appreciated. Had it been recognized as the dead-end of the product, maybe more sophisticated models would have been pursued.
My guess is that there is going to be all this investment and research poured into AI voice assistants and in the end the result is going to be the same.
Good read and really just highlights the complexity and tension involved with huge corporate organizations. I would not have ever guessed that Alexa alone would have so many teams and engineers involved, because from the outside it seems like the only iterations were on the physical models. The voice assistant didn't seem to change in any meaningful way for a very long time. It even seems like Amazon employs some form of internal start-up model, but that still struggles because of the internal politics. Maybe when it comes to individual products, its best to keep the teams small and nimble.
Yeah I'm impressed with how well it does a straight shot paragraph but I can't use it comprehensively. It's the little things that make it unworkable, like moving the cursor and selection to do rapid nonlinear edits, getting it to use your personal capitalization style, and I'd imagine it's a nightmare trying to write code with voice dictation, with how nonlinear and dependent on symbols and spacing it is.
GPT voice is almost exactly always what I want, I think using whisper, and yet every app and both OS devices android and iOS can’t even transcribe a single sentence correct without errors. Pathetic really.
I accidentally did this to a vacation home we stayed in once, like weeks after we checked out. It took me a solid minute to piece together 'where the music was going'. I felt so bad; hopefully the house was vacant.
About a month or two ago Alexa started telling us it didn't understand "stop" while it played the audio we didn't ask for (Alexa's voice recognition is really poor). I had to unplug it.
I probably could have used the app to stop it but I didn't think about that at the time.
Always surprised how little Alexa changed over the years from initial version.
Both on software and hardware there could have been so much possibilities and learning from years of user feedback.
Appstore and skills deserted from years of broken to unusable functionality.
Hardware remain stuck to initial speaker version where there was possibility to replace so many dedicated boxes around house each connected to power supply providing singular function - Access Point, routers, smart device gateway, streaming device, NAS for backup or private cloud, etc
Last one failure for entire BigTech where desire to maintain control prevented any form of standardization or interoperability to the point where hobbyist open source solutions are now leading on how to do smart home right way and not abandon user base in 6 months after release.
They haven't lost yet. Alexa is still the top conversational system appliance for most households right now. A smart speaker is almost an essential device among my group of friends. Siri on HomePod is still subpar for basic kitchen tasks like setting timers, adding items to shopping lists, and creating reminders. I don't even think Google has a current device that competes in this space.
I really want HomePod to be better at household tasks such as managing shopping lists, timers, and reminders, but it's not there yet. As soon as the HomePod can replace my Alexa devices, I'll be all in. I have a HomePod right next to every Alexa device in my house, and I'm just waiting for Apple to turn on their "Apple intelligence."
Does siri/homepod not just directly connect to your iPhones calander/notes/reminders?
I honestly ask this because I never tried though… I use my homepod as a glorified timer, alarm clock, and speaker. I’m just sitting here in the apple ecosystem hoping one day things will actually feel connected.
It doesn't "directly" connect - it changes things on iCloud - it just doesn't work well. It takes more than a few seconds for it to add things to reminders. It has to verify your voice to add things to reminders - god forbid if you are sharing a shopping list with another person in your household. That nearly doubles the time it takes to add something to a shopping list. With Alexa, adding things to a shopping list is instant, it doesn't verify if you are authorized to add things to the shopping list, Alexa just adds them.
I'm not sure I would have a lot of use for voice-controlled conversational AI, as odd as that may sound.
Like most people, I use Alexa for _commands_: home automation, timers, tell me the weather, ask a specific question looking for a specific answer, play this music. That's not "conversational", and I don't want it to be.
I use generative AI for other things, mostly writing code for me, or telling me about code problems in general. It's rare that I want output that I'm _not_ going to copy/paste somewhere.
Alexa isn't a failure, it just didn't sell more stuff for Amazon. And, well, it costs an awful lot for them to keep running. So maybe it is.
I don’t know if my voice is weird or what, but these things- alexa, siri, etc. don’t work for me at all, they can’t understand anything I say unless I repeat it slowly half dozen times, yet regular people understand me just fine.
My s/o talks weird to devices... she pitches her voice oddly and it's hard for even me to understand her. Devices do not understand. Maybe record yourself or ask someone to listen to you talking to a device and see if you're unconsciously changing delivery?
My voice is extremely, unusually deep. I actually suspect that is the problem itself, it is deep enough that it doesn’t even register as speech to these systems.
I had a manager who came from Amazon and shared some of the horror stories from their time on Alexa. It seemed like a lot of senior folks were only using it for career advancement because it was where all the R&D money was flowing into. So a bunch of hard working folks building things, leadership playing political games with each other, and an org that had no idea what problems they were actually trying to solve.
It definitely captured the market, but without a top down vision, the whole thing was just a huge letdown.
The problem with Alexa isn’t technology or product design. It’s the organizational issues. It was a bloated team with over 10000 people. After a decade of investment, they had no real business model. They may have had decent engineers and scientists, but lots of managers and executives got promoted to very high levels despite showing zero results, mostly based on the size of their teams. Those empire builders then moved around the company and industry, leaving behind a wasteland.
This is an industry-wide problem. The solution is not to overhire and crack down on these empire bulders, but this means fewer FAANG jobs. Engineers have to pick their poison.
I set up our home with smart shades, lights, iAqua, Somfy, etc., but my family and me now just use the remotes due to frustration with Amazon Alexa.
I was always under impression that Amazon uploads all our data because I notice data transfers whenever I use voice commands, which makes me doubt their privacy claims.
It seems like Alexa was designed more to learn from us rather than to genuinely assist us. Its primary goal appears to be gathering data rather than helping users.
> Alexa put a huge emphasis on protecting customer data with guardrails in place to prevent leakage and access
For what it’s worth we were working on a conversational health app and this why we picked Alexa over alternatives (if you’re big enough to get on GPT enterprise you can probably implement HIPAA safeguards, but we never got replies)
I remember talking to Alexa about eight years ago. I was a bit skeptical and asked, "what color is a red car?". I was seriously impressed that it knew that a red car is red.
I threw my Alexa in the garbage after I asked it to turn off my house lights and it tried to recommend something (a schedule or something) for what must have been the third time.
I never used smart speakers but do they allow third party apps and if they do how developed is the ecosystem? I see so many use cases as of voice commanded apps, voice games etc.
I've always thought it was Apple who dropped the ball with Siri.
When Siri came out in 2011 -- two years before Alexa -- all my coworkers and I had iPhones. I remember sitting in my office as people yelled at Siri all day trying to get her to be useful. "Hey Siri, what's the weather tomorrow? No... No SIRI, WHAT'S -- THE -- WEATHER -- TOMORROW!"
Even though it sucked, it seemed every hardcore Apple user was ready to jump onboard. Who cares if I'm in a crowded office with people trying to get work done while I spend 10x longer to perform a function in the noisiest possible way? I'm using this thing!!
The voice recognition has improved since then. But the functionality still sucks.
When I'm in private, there are a couple commands I'll use.
- "Hey Siri, call xyz" where xyz is someone in my contact list I have tested with Siri and is known to work. Not recommended to try without testing first.
- While cooking, "Hey Siri, set a timer for 10 minutes." Works great.
- While driving and navigating: "Hey Siri, take me to the nearest gas station." That one is pretty good, except the actual maps are not smart enough so sometimes you'll be turned around in the opposite direction you were going, since technically that's where the nearest gas station is.
I never understood why they couldn't make this tool better, even before LLMs and without any AI at all. Just hard-code a bunch of phrases, and ways to translate those phrases into some action.
"Hey Siri, how close is my UPS delivery?"
"Hey Siri, where can I get the best price on xyz cat food?"
"Hey Siri, what's my bank balance?"
"Hey Siri, how much is a Lyft to xyz?"
I bet if they had a single developer working on adding Siri commands full-time, they could announce something like 20-50 new Siri functions at every WWDC.
But it seems the goal now is just "Make it an LLM," instead of focusing on recognizing the task that the user wants to do, and connecting it to APIs that can do those tasks.
They could've dominated the "conversational system" market 13 years ago.
> But it seems the goal now is just "Make it an LLM," instead of focusing on recognizing the task that the user wants to do, and connecting it to APIs that can do those tasks.
I almost completely agreed with you, but this is not true! Apple is trying to solve the task & API problem with “task intents”, on which they go into more detail outside of the keynote: https://youtu.be/Lb89T7ybCBE
The new Siri models are trained on a large number of schemas. Apps can implement those schemas to say “I provide this action” (aka, the user intends to do this action). Siri can use the more advanced NLP that comes with GenAI to match what you say to a schema, and send that to an app.
These app intents are also available to spotlight and shortcuts, making them more powerful than just being Siri actions
Non LLM conversational agents are a dead end. It's a waste of that one programmer's time now that we have an imperfect but pretty good solution. There is zero discoverability in voice commands and the best you'll do is remember 3 to 10 commands if you can't actually ask the agent anything. Better to have that person work on the team to improve the LLM.
Come home to find a yellow ring on the Echo. "Alexa, tell me my messages." "You don't have any messages. You have one notification; would you like me to read it?" <silently> "Jesus H Christ, what do you think?!" <aloud> "Yes"
My opinion is that data access restrictions did not cause Alexa to fail. If you think about it, it wasn't lack of machine learning that contributed to its issues. Alexa attempted to solve the long tail of customer requests with the equivalent of spaghetti "if statements" - rule engines. This was never going to scale. Alexa did not have a generic enough approach to cover the long tail of customer requests (e.g. AGI). With rule engines, there was always a tension between latency and functionality. Alexa solved this with bureaucracy - monitor latency, monitor customer request types, and make business decisions about how to evolve the rule engines. But it was never fundamentally able to scale out of the most basic requests or solve chicken-egg problems (customers don't ask complicated requests because Alexa isn't capable, so they don't show up as large enough use cases to optimize for). Top use cases remained playing music and setting timers.
A more fundamental issue was monetizing. Early on Bezos liked the idea of having a small, essentially free, device that would reduce the friction to buying things. If you remember the "easy buttons" Amazon floated there were many ideas like this. In practice, building a robust voice assistant that could purchase items proved challenging for a myriad of reasons. So the business looked for other ways to monetize. Advertising kept coming up but there was rank and file pushback to this because it could break customer expectations and/or privacy concerns. Alexa considered pivoting into various B2B ventures (hospitality, healthcare, business) and other customer scenarios (smarthome, automotive) but took half-measures into each of them rather than committing to an opportunity. It felt like a solution looking for a problem.
Alexa would have (could still?) benefit from modern LLM technology. However to be truly useful it would need to do more than chat. It would need some layer to take actions. This would all have to be carefully considered and designed so that it scales - so that it isn't a bureaucracy trying to measure what people are wanting to do and "if statement"ing a rules engine to enable it. OpenAI and others appear to be poised with the machine learning expertise to do this.
Finally, it's my opinion that Alexa's machine learning scientists were very good, however as a population they did not appear to me to really care about the business/product use case. Many of them worked on research for publication on problems like distance estimation, etc. The expertise was very heavy on voice transcription and audio processing. However there was less expertise in "reasoning". This I hypothesize contributed to the approach of iterated rules engines, with the science community focused primarily with improving transcription accuracy by small numbers of basis points.
I assume the goal of Alexa was never to be the top conversational system on the planet, it was to sell more stuff on Amazon. Apple's approach to making a friendly and helpful chat assistant helps keep people inside their ecosystem, but it's not clear how any skill beyond "Alexa, buy more soap" was going to contribute meaningfully to Alexa's success as a product from Amazon's perspective. I saw the part about them having a "how good at conversation is it" metric, but that cannot be the metric that leadership actually cared about, it was always going to be "how much stuff did we sell off Alexa". In other words, Amazon did not ever appear to be in the race to make the best voice assistant, and I'm not sure why they would want to be.