Hacker News new | past | comments | ask | show | jobs | submit login
Amazon virtually kills efforts to develop Alexa Skills (arstechnica.com)
172 points by Stratoscope 5 months ago | hide | past | favorite | 203 comments



I was an early Alexa PM for Skills back in 2017. Even back then when Alexa had all the hype and first mover advantage, it was extremely difficult to provide any meaningful traction to “Skills developers” for various reasons (friction in VUI, discoverability issues mainly)

Fast forward to 2023 when OpenAI came up with GPT Store, it felt like a deja vu of Alexa skills in some sense. My understanding is that GPTs are also struggling to gain good traction.

There are some fundamental reasons why conversational 3rd party platforms are hard. If there’s any interest I will write about it in the Enterprise AI substack (nextword.substack.com)


Please do write about that.

I've seen so many Alexa projects at hackathons that were exciting at first glance but didn't go anywhere.

My favourite is event recommendations: lots of people try building an Alexa thing to help you find events, but it turns out listening to a bot read out a list of 20 things happening this weekend is way less useful than browsing a web page with the same information all on a screen at once.


Amazon's failure is not realising that Alexa's best use is an universal remote control.

The interface with smart devices is good but clunky. While I can say "Alexa open my curtains" and I can go to the slow and bloated Alexa app to open the curtains at a certain hour, I cannot say "Alexa open my curtains tomorrow at 6:45am".

I would love an app that would allow better automation, like doing an action at a certain time and playing an alarm X minutes later. However, skills are voice-only and nearly useless.


> I cannot say "Alexa open my curtains tomorrow at 6:45am".

Works for me; however I often say the time before the action, so something like: “Alexa, at 8pm turn on the living room lights”, but “Turn the heating off in 15 minutes” also works for me.

It then creates a timer to activate said device/group at the time requested. (Edit: I also use commands like "Alexa, turn on the lighthouse lamp until 11:30pm", which turns on that lamp and sets a timer to turn it off at 11:30pm)

However all my devices are using the native smart home stuff in Alexa and they are exposed to Alexa via home assistant (So dunno if HA is doing some magic sauce to expose the devices in a certain way that allows such voice commands), So your mileage may (and indeed appears to) vary.

Granted sometimes it mistakes my intentions and I need to repeat myself.

I’ve also been able to create routines via voice, I first discovered that when Alexa suggested to me it could create the automation for me after it perform a request.

Another Edit: When saying "At X do Y" Alexa can only do that action if its within the next 24 hours, guessing thats its timer limit. I also tested saying "Alexa, Every day at 6pm turn on living room lights" and it created a routine for that action, I was then able to disable that routine by saying "alexa delete the living room lights routine", however it just disabled it rather than deleting it like I had requested (and checking the transcript in the app it did pick up i said the word delete and not misheard me).


> I cannot say "Alexa open my curtains tomorrow at 6:45am"

The "funny" thing is Google Home could do this (Turn on/off device after 8 hours) but recently they've removed the option & the assistant refuses. I assume it's for liability purposes, but it feels stupid.


Maybe its my imagination, but Google Home could recognize a lot of things that it just can't do or can't do well anymore. It gives the impression that they just downgraded the whole infra behind it to run an inferior, cheaper model, hoping people will eventually stop using it so they can pull the plug altogether.


If I asked me Echo to close my curtains, it would respond that it doesn’t know about a device with that name. If I then ask it what devices it does know about, it won’t give me an answer.

Will the Google Home tell you the device names it can see?


I don't think it can do that. Asking variations of "what devices/controls/lights can you see/access/control" just gives a google search result for it. Quite stupid flaw imo in the design.


Don't know about opening curtains but I have a Deepbot connected to Google Home (now Gemini) and saying "Hey Google, start cleaning at 5.30 PM" works.


If 'Alexa, open my curtains' already works for you, then creating a routine containing the same command to be triggered daily at the needed time should solve your problem.


But I cannot create a routine or change its time with my voice.

I can say "Alexa, set my alarm tomorrow at 7 in the morning". I wish I could say "Alexa, open my curtains at 6:45 in the morning tomorroe" instead of having to go to my cellphone and use the clunky Alexa app.

It just seems like a wasted opportunity for a routine. I'm sure a lot more things would be available if Amazon let developers use more features.


Seems like it'd get unwieldy fast. Would need an efficient interface for listing and editing registered tasks, or you might get stuck with a couple of hundred tasks of the type 'Alexa, say fart fart fart at three/four/five in the morning' if a kid ever where to pass through the house.


>Seems like it'd get unwieldy fast.

We're not asking to construct a complex schedule via voice command, the app is sufficient for that. Sometimes you just want your curtains opened at 6:45 tomorrow because something is happening and you just thought of it. That's would actually be convenient.


This works for me. I just said "turn my office lights on at 6:45 pm" and it did just fine.


And it's not even possible to enable this with Alexa routines, which should work as a "hobbyist/enthusiast" version of skills development, but are in reality extremely limited.

e.g. As a routine writer, you could include a command within a routine to turn on lights at 6am. However, it would be fixed at 6am whenever the routine runs, with no flexibility since there's no concept of variables or voice input available to routines. A more sophisticated platform would offer more flexible commands, i.e. "turn on lights at {voice_input}".


The core problem lies in latency and information bandwidth - for which eyes (which sees the “gestalt”) are just far superior to reading or listening to streaming text (which is linear).

The one main situation where NL interfaces are superior is when you are mobile (like driving) or hands are tied up.

I think this affects GPTs just like it did Alexa. Which means that GPTs aren’t the final UI. The real innovation will be in the right AI UX.


Not sure I agree on the core problems.

The core problem is that these systems are just so incorrect in fundamental ways that they're effectively useless.

Imagine a buddy of yours tells you about an event he's pretty sure you'll be interested in. Why does he tell you about this event? Well, he knows your interests, what kind of things you enjoy, when you're free, who you might want to go to the event with, how much money you're willing to spend, how far you're willing to travel, when you like to go out... So when you're on the receiving end of such a suggestion it often feels great! It's like you've struck gold.

Now imagine your average 'AI' powered recommendation engine reading you a list of events. It doesn't feel magical. It doesn't even feel like it knows what the hell you enjoy doing half the time. Forget about knowing about your free time, budgetary restrictions, family restrictions, who you'd be able to go with; None of that stuff is even sort of in the picture. And it's all delivered to you in a voice that sounds like it would be as happy to kill you as give you advice. There's no lively back and forth on the logistics of the event. No feeling of discovery as you two talk it out, honing the plan that brings it from an abstract concept to reality.

It's just dead and lifeless and shitty.


I agree with you, and I cringed a little when I read the following from the OP:

> There are some fundamental reasons why conversational 3rd party platforms are hard.

In my mind the big fundamental problem here is the "3rd party". I'd love to have an "AI assistant" or an "AI buddy" that could watch everything I do and say and write and really get to know me super well... as long as I can be confident that I own and control everything it observes and learns. I sure as hell don't want a 3rd party involved! But alas, I don't see a way we get there that doesn't involve Amazon or Meta or Google or OpenAI sitting between me and my "AI" tools, at least in the short run.


Fwiw this is what I assume apple's long-term ai strategy is.

Let the hype-funded unicorns fight to develop (& end up commodifying) the tech and then design/sell selling devices that can support it locally. In that world, the AI assistant that you buy is a discrete piece of hardware, rather than a software treadmill.

Of course, this could mean that you end up on a hardware treadmill, but I think that's probably less bad, granted we can do something about the e-waste.


You're comparing a person recommending a single event versus an AI providing a list. In other words, proving OP's point.

GUIs provide information in 2D, letting eyes skim and bypass information that's not useful.

VUIs provide information in 1D, forcing you to take information in a linear stream and providing weak at best controls for skipping around without losing context or place.

Not coincidentally, this is why I absolutely hate watching videos for programming and news. If it's an article or post or thread, I can quickly evaluate if it has what I want and bypass the fluff. Videos simply suck at being usable, unless it's for something physical like how to work on a particular motor or carve a particular pattern into wood.


The person you are replying to is arguing the opposite - a future VUI could know your interests and just read you out the one relevant event rather than reading a list.

Alternatively it could summarise into “there’s a standup comedy gig, a few bands and various classes - what sort of thing are you looking for?” and then discuss with you to find the 2-3 events that are most relevant rather than reel off one big list.


And if it were a visual display of those things, I would have honed in on what I wanted and gotten my answer in the time it took me to ask what I was looking for.

It may be fantastic as an aid for low and no sighted people, but so long as I can read, a VUI is strictly inferior.


Disagree - both can complement each other in different circumstances.

If I’m ironing and want to know when my first meeting is, chances are a VUI is better.

If I want to see my whole days itinerary, a GUI is probably better.


> a future VUI could know your interests and just read you out the one relevant event rather than reading a list.

This may require a level of accuracy and intelligence that is unobtainable to work.


I would still prefer it to send it to me on my mobile device display. Voice interaction is nice for accessibility, bu the first method of control (whatever it is) is faster.


The point I'm trying to make is that this thing we're calling a 'VUI' is shit. There's no reason speech has to be this boring one dimensional thing. It's like the people that designed these things have never had a real conversation in their lives. When you're speaking with another person, or multiple other people, you're constantly exchanging cues that allow the other person to understand and re-calibrate what they're saying. These are verbal sounds, non-verbal sounds and physical movements. A crinkle of the forehead, a shake of the head, an uttered 'aaaaah' or a quiet verbal affirmation in support of what's being stated. It's not a single uni-directional stream of information, it's a multi-directional stream coming from multiple multi-modal sources at the same time.

None of these basic realities are accounted for in current technology. Instead we have these dumb robot voices reading us results from a preprocessed script that it thinks answers our question. No wonder the monkey part of our brain immediately picks up on the fact that this whole facade isn't just a lie, but an excruciating lie. It's excruciating because it's immediately obvious that there's nothing else 'there' to interact with. Even when speaking to another person over the phone, there's a huge amount of nuance you can pick up on. Are they happy? Are they sad? Are they frazzled? Are they in a rush? Are they relaxed? And you automatically calibrate your responses and what you say in the conversation based on all of these perfectly obvious things. Normal humans automatically calibrate what they say, how they sound, what they suggest based on these cues. It works really well!

There's no reason voice stuff has to suck. It has worked pretty great for humans for thousands of years. We're evolutionarily tuned to it. It's just that all the technology we've created around it totally sucks and people are delusional if they think it's anywhere near prime time.


This is all technically possible, but then there's also the privacy/security aspect. Many who would actually be into a solution like this won't be too hot to share the necessary information in the first place. And with good reason: companies with the resources to provide a decent experience don't have the best track record of protecting user data, and ensuring only the user has control of it. The privacy conscious would rather self-host, and end up losing out on capabilities in the process. So it's a sort of catch-22.


Part of the big challenge here in my mind is that companies are reluctant to put data into the world for others to consume in a friendly way - if, say, event organisers, put data out as open-API's, there would be the opportunity for a self-hosted or "convenient" third party (ala Google, Amazon) to create conversational experiences on top of it - private user and privacy uninterested user is well served (as is "event seller" as it's exposed to more people). But, as long as we're stuck with systems having to pull data by web scraping, no one can build a good solution that could work for either scenario.


I'm the case of events, I'm not so sure. Ticketing platforms want to drive sales via easy discovery and wide distribution. Take Eventbrite as an example (disclaimer: I used to work there). They

1. Have an API (https://www.eventbrite.com/platform/api)

2. Include JSON-LD Events data on their events pages (https://developers.google.com/search/docs/appearance/structu...)


> Now imagine your average 'AI' powered recommendation engine reading you a list of events.

I think the issue is that having AI just fetch information catered towards any human is not using AI at all. I'm sure the hackathon groups pitching it all started with an idea of building a highly trained AI system whereby the recommendations are meaningful reflections of whatever information it has from you. Unfortunately for them, the most lucrative part of their plan neglected problems with both how to create an AI pipeline that takes many piecemeal inputs, along with millions having missing values represented some way, and renders meaningful outputs and neglected that success would only reap a massive backlash from privacy advocates.

In the end, their plan for a super-intelligent life assistant turns into just fetching event lists from facebook (or elsewhere) without even using the demographic data it has.


I agree, the real deal would be an AI assistant that actively learns and remembers everything important about you, and can access and utilize all that information when having a conversation, while also having access to external services, your calendar and so on.

I wouldn't be comfortable running that as a cloud-service though. Should be open source and run at home on my own machine.


Right, because if someone else is running it, you can be sure you're getting served thinly veiled ads, something someone else wishes you'd want, rather than something you really like.


To a point yes, but also comprehension isn't there - I have alexa and a goodly amount of automation in my home, and it's job 99% of the time? Let me turn the lights off after I'm cozy in bed, set timers for the kitchen, and tell me the weather forecast for the day. Things where its interface is the appeal, I can do all those things without pulling out my phone, getting up for a switch, or stopping from getting dressed.

That is the value, not in having conversations with the bot -- google got closer with its assistants, but only because it has a creepily deep profile on its users, so conversations with google had more 'context' than alexa ever could.

I tried to ask alexa for the weather yesterday (I wanted to know if it had rained overnight to decide which shoes to wear to walk the dog), and it first gave me todays weather, then told me it only knew weather for the next 14 days - yesterday was too hard to predict I guess?

But that's the point - simple tasks where the interface is circumstantially superior? Awesome. But if I want to just chat with my computer, that novelty wore out with Eliza & Dr. SBATSO. The conversations with ChatGPT are deeper, but no more meaningful.

So what can alexa do to make my life simpler? Don't read me wikipedia pages in response to -anything-, if you can't summarize it in two sentences, say it's a long answer and you can send it to my phone if I'd prefer. Make the interactions short and sweet. Control the lights, make me coffee, walk my dog, order more coke. I don't need it to have "new skills" - I just want it to be better at the ones I actually want to use.


> ... set timers for the kitchen ...

One of their best devices appears to in the "Temporarily out of stock" category now. The echo wall clock. It pairs with an echo device and provides a visual representation of the count down timer.

This reminds me of Ambient in the design philosophy.

I fondly remember the Ambient Orb ( http://www.ambientdevices.com/about/consumer-devices ). They had an umbrella at one time where the handle would glow if it would rain that day. They had a LCD weather/clock ( https://ambientdevices.myshopify.com/collections/vendors?q=A... ) that I really liked.

With push (notifications), it's interrupting. Alexa isn't too bad about that since it's a ring color / icon on a screen. With pull, it's "I need to fetch this data". With ambient, it's there if you want it when you want it.

I don't need to ask how much longer on the timer with the clock - it's there at a glance.

Unfortunately, Amazon has been making the devices (especially ones with a screen) into an advertising channel to the point I'm looking at replacing the various echo show devices with just echos (and a clock if they ever come back into stock).

On reading Wikipedia... Alexa used to have a knowledge engine somewhere in its code. You could ask it what color a black cat was and get back "a black cat is black." You could ask "what color is a light red flower" and get back "a pink flower is pink." Asking "what color is a blue bird" gets back "a blue bird is blue, brown, and white." That hinted at a deeper knowledge engine. There was also an inventor <-> invention knowledge base. One time I even had it return back part of the query language by asking it if two people (who were born on the same day) were born on the same day. The knowledge base functionality appears to have been delegated to "search Alexa answers".

It has gotten decidedly worse over the years as useful functionality that had no revenue associated with it got removed while "revenue enacting" features (pushing fire tv, product advertisements and such) have been prioritized.

Many of my echo shows are now "face down" because the screen cycle of stuff I do not care about is out of the corner of my eye distractingly fast. Time and weather are better served by my watch now.

It still does timers and reminders acceptably well.


>The core problem lies in latency and information bandwidth

I actually think the core problem is that Amazon just isn't competent at deriving insights from consumer behaviour data.

If I buy a vacuum cleaner on Amazon, based on the Amazon web store recommendations I fully and without a shred of sarcasm expect Alexa to think that I've developed a vacuum collecting habit and recommend a vacuum conference if I ask it for events.


The problem with that this is that this is the objectively "correct" recommendation. Some who just bought one obviously had an interest in one and people do return products to get different/better ones.

If it looks stupid but it works.... (Amazon has all the data to check that it indeed works)


For soft goods, maybe. But for durable goods it is objectively the wrong decision. They ought to know if you have initiated a return, and act on it, but they don't.

What percentage of Amazon shoppers make repeated, back-to-back purchases of washing machines or kitchenaid mixers? I'm certain it's vanishingly small.

What percentage of people who just bought a kitchenaid mixer would be interested in baking pans, or whatever? Probably more. But if you buy a kitchenaid mixer, tHe AlGoRiThM just sends you ads for more kitchenaid mixers.


Conversational interfaces are superior... when the counterparties have context.

In the parent situation, I don't want a GPT spoken interface to give me the top 20 events: I want it to give me 1-2 events that I am most likely to enjoy.

In the same way that actual conversations take into account tone, facial expression, etc. to jump straight to important information.

I thought that's where Google was going with their "we have all your data because we run all your services", but it seems they Microsoftified before they could get services cooperating for the larger good.


> Conversational interfaces are superior... when the counterparties have context.

Why? I find conversational interfaces poor for common data retrieval. I can read faster than you can speak. I can type faster than I can speak. I'm staring at a screen 14 hours a day anyway. Just show me the list of 20 events and sort it by what I am most likely to enjoy. Provide links for more information. Show me visual promotional materials. If I need to cross reference it with my calendar it's easier if all the information is visual.


> Show me visual promotional materials.

When I got to this point in your comment, I remembered the Seinfeld episode where Kramer was recreating moviefone and tried to speak a trailer, with mouth-sound-effects.



“I want it to give me 1-2 events that I am most likely to enjoy.“

That's the core property of voice interfaces and I see surprisingly little awareness for that: it does not matter if it's a phone menu beep code tree or a GPT or a star trek ship's computer: the low bandwidth linearity of the readout will never go away.

This is what makes voice interfaces so hugely attractive for the "searchy advertisial complex": if you haven't bought enough ads that the almighty relevance algorithm (1) puts you in the top spot you're out. What used to be the first page on the web is the top spot in voice, second place is first loser. No amount of intelligence can ever change that, voice interface implies handing over control in an unprecedented way.

((1) technically, claims that result ranks are not sold aren't exactly lies, when result ranks don't go to the highest bidder. But that does not mean that ad spend isn't a contributing signal in any number of deeper layers so in the end results appear indistinguishable from highest bidder, only that buyers don't get any contractually guaranteed result list exposure for their money)


I don't want a GPT spoken interface to give me the top 20 events: I want it to give me 1-2 events that I am most likely to enjoy.

This is fundamentally impossible for a computer though, because even if a computer has perfect historical information about you it can't know some random things that would change your mind in the moment. For example, if you've been to every gig a band has done for years, but at the last one your girlfriend dumped you, a recommendation engine is still going to suggest that band's gigs even though it's unlikely you want to be reminded about them. To most users that immediately looks like a bad recommendation. If the system is only suggesting 1 thing then the whole system looks broken.

The only way around that is to increase the number of recommendations. Hence every system giving out 10+ options - because the people who make it want it to be slightly better than useless.


Not fundamentally impossible. Just as a friend may know not to recommend the gig as they're aware of the breakup, so can an AI. Heck after conversing with Pi a few weeks ago, I'm fairly convinced it'd at this point be able to handle that kind of nuance if it had access to the relevant data.


>> but at the last one your girlfriend dumped you

"I've recognized you removed all future calendar events related to {girlfriend} and your recent text messages concerning her had a negative sentiment. Did you break up?"

Not the world I'd want to live in... but for people less concerned about their data, I can't say it wouldn't be useful!


> For example, if you've been to every gig a band has done for years, but at the last one your girlfriend dumped you, a recommendation engine is still going to suggest that band's gigs even though it's unlikely you want to be reminded about them.

I mean sure, but just think about it, wouldn't the same happen if you have a friend telling you about the event? Or if you had an attentive concierge trying to organise programs for you? How would you like them to handle it? Not by blindly listing more programs that is for sure.

"Hey you love Blasted Monkeys. Did you know they are having a gig this weekend?"

"Nah, man. We had a bad breakup with Samantha at their last one. And besides it was really her thing and I was just tagging along."

"Oh, that's rough. I didn't know. Blast those monkeys then. How about a day a the beach then? There will be a surf class at ..."

This is the kind of interface a spoken event recommender should have. Is this much harder than just listing events? Yes, it is much harder. The problem is that if you don't go all the way then it falls into a weird uncanny valley. It feels like you are talking with a human, but a very stupid one.


conversational interfaces are superior as the input method but text and images is always superior as the output method (except for driving without FSD or another rare scenarios)


"Hey Jamie, pull up that video of that moose hitting the car... "


Imagine a sufficiently advanced LLM enriched with the context Facebook or TikTok have about a typical user. That’d be spooky.


Yes the latency was annoying, but the "AI" part was severely lacking. You couldn't just say _anything_, it had to be formatted the way Alexa expected.

You also couldn't really interrupt it, it wasn't a "natural" conversation. I knew what it was going to say and just wanted to reply already, but it only listens when it finishes speaking.

It also talks incredibly slowly, you could 5x the playback speed and it would probably still be understandable.

If all those things were fixed, I'd say it could make an okay product. Not great, but at least okay.


the "right" "AI" UX will, in all likelihood, be something like a BCI.


With list on paper / screen I can compare things, go back & forth, do other stuff (like check my calendar) and the list is exactly where it was, trivial to pick up again.

Listening to it? What a waste of my time and focus on something trivially and already solved.

And use this during driving as some other mention? Sorry not a fan to say at least, it definitely impairs everybody who is driving to certain extent, humans simply don't have efficient parallel processing of things that require focus in this way.


Aren't the GPT's consumed in a text based UI? Lots of avenues for discoverability there


How?

“Discoverability” here means the ability for the user to learn the systems abilities. Historically, you could read a menu or list of app icons.

A chat box is empty - it tells the user nothing about the systems abilities, just its interface. A smart speaker is even worse - there’s minimal UX hints. Asking a chat box “what can you do” is not likely to be exhaustive, and and will likely require a series of “can you do X” queries.


We need to get to the point the very least likely response is "I am sorry Dave I can't do that". Obviously, I am not going to ask Alexa to take out the rubbish; but any reasonable request needs to have a >>50% chance of working; rather than a <<1% chance.


Show suggested messages that mention specific gpt’s. After each message show suggested follow-ups


> After each message show suggested follow-ups

Ask people how they feel about Alexa's "follow up" suggestions ;)

It's not easy to do discoverability in general. Especially not through an intentionally limited modality. People study this stuff, many businesses and product researchers have spent years workshopping ideas. If it was a quick and easy idea like "just tell the user" then it wouldn't be a challenge in 2024.

The problem with "show messages... suggest followups" is that you can't teach the user about new features that are unrelated to their current interaction because it feels like advertising, and its distracting to their current task at hand.


The problem is not discoverability. The problem is there is nothing there to be discovered that is useful or entertaining enough for people to pay money for. The only thing surprising here is that it took so long for Amazon to admit its a failure.


I disagree. I have used Alexa and Google Assistant a lot and also developed an Alexa skill for controlling Dyson's robot vacuum cleaners, so I'm pretty familiar.

The problem is that the set of supported operations are always MUCH smaller than the set of operations people randomly try. You might develop a skill with 200 commands or whatever and think you've covered everything, but people can up with thousands of possible commands just by guessing.

This means if people just do "I'll try asking this..." then probably 80% of the time it won't work. That's an incredibly frustrating experience. You quickly give up and just stick to the features that you know work, and never try to find any new ones.

But I also disagree that OpenAI has the same problem, because LLMs means you don't need to manually add thousands of possible commands, so any random request that people make is MUCH more likely to work.


Honestly I'm disappointed Amazon hasn't married these two technologies either, but I think it must just be too expensive. I would think you could even prompt a midrange LLM to process the request and generate structured data for the limited list of supported prompts for a skill.


Too expensive and they're probably scared about how uncontrollable and unreliable LLMs are. Can't say I blame them really. Every time a company deploys an LLM you get a news story like "hurr durr I tricked it into saying poopy pants" or whatever.


You need discoverability to incentivize innovators to add more value, which itself feeds back into the value of the overall ecosystem, etc.

But yes, natural language as an interface requires higher activation energy versus GUI which hampers how much value Alexa Skills (and GPT) can provide.


You do need discoverability, but you also need a capable platform and things worth discovering.

When I look at Amazon’s skills page, it’s a wasteland. Until there’s something worth surfacing, discoverability doesn’t really matter.

That said, I do have two Echos in my house (and one Home Pod) and use them all the time for basic things.


Alexa's value has been dropping for me a lot lately since they follow nearly every interaction with some sort of commercial. '10 minute timer set, while you wait I can....' This appears to be one more nail in the coffin for something that could be very promising.

I think that for this segment to really take off Amazon/Google/Apple all need to start releasing usable stuff to the open source world. A good open source home server voice based back-end that can control open-source devices will start creating apps and drive exploration the space. Eventually it will find the killer apps that will truly kick-start things. The houses lost to people setting up their own back-end is likely very small and the returns could be very big.


I bought the one with the screen. It constantly shows advertisements so I can't have it in my line of sight while doing anything else in my home. If you buy any personal items be sure to take them out of your amazon "recently viewed" queue immediately.

It shows me triumphant notifications (!) that turn out to just be a request for me to review a product. Yes, that's why I _bought_ this device, so the company I bought it from can use it as a portal to ask me to do free work for them.

It has failed to add any value other than as a device I can use to ensure that our Alexa skills are in fact working or not when I take bug reports out of our queue.

For the amount of counter space this thing swallows I constantly wonder why they couldn't have put a few USB charging ports somewhere on the side of it.

Speaking of USB and/or Bluetooth, why aren't there any "peripherals" for this device? It's just a useless screen connected to a cloud of corporate garbage that no one would ask for and it has zero ability to step outside of this sphere. So, yea, jam it full of "generative" content, how could you possibly make it worse?


I got one as a welcome gift for work and my wife ended up smashing it with a hammer because it gave our seven year old unrestricted access to YouTube. She pulled it out of the garbage twice and hid it from us.

Thing is durable, she couldn’t actually crack the casing but broke something inside so it stopped working.

(To be clear we do let our kid watch YouTube supervised after she’s finished lessons, although not very often)


Twist: hiding the Alexa in the trash was a sophisticated game to get the kid to develop puzzle solving skills and a rulebreaking ethos.


Hah, I’ll admit we didn’t punish her further because we thought “wow she’s very resourceful!”


I have an Echo Show 8 that I wanted to watch YouTube on, but it doesn't work. Which model can access YouTube?


I just wanted it to show a clock in a huge font so I could see when I needed to rush back into the "office" when taking a break or making coffee. Alexa Show became useless after a while because of all the non-disable content, and it barely even shows the time anymore.

I swapped it back to my old Echo Spot (the round screen one). It's great at being a clock.


The Echo Wall Clock is also fantastic at being a clock!

Although sadly now discontinued, when mine breaks there doesn't seem to be any comparable replacement from any brand at any price.


The Echo Wall Clock is great! (if a bit ugly) Setting kitchen timers a lot of what we use Alexa for, and the wall clock in the kitchen works well. Dissapointing that it is discontinued!


I beta tested many of these products and found them so dystopian and useless I refused to ever participate in a beta again.


I bought one with a 10" screen because I needed a new picture frame. I have mine only showing pictures that I loaded into an Amazon album. I didn't find it too hard to turn off all other stuff, although sometimes it turns back on some other feature which requires me to disable it again from the interface.


> why aren't there any "peripherals" for this device

They made a bunch of peripherals but they never caught on.

They made media remotes, game pads, a sticky note printer, scales, and other things.


> Alexa's value has been dropping for me a lot lately since they follow nearly every interaction with some sort of commercial.

This is every Amazon hardware product unfortunately. Kindles are the least bad, but they're not great. They sell really cheap hardware (TVs, Alexas, tablets), but you pay in data and advertisements. It's probably a choice many consumers are happy with.

I bough a FireTV (the entire TV uses it for the US, not just the stick) because it was incredibly cheap. $150 for a 42" 4K television, shipped to my house in two days. That's absurd. It has ads that like to autoplay on the home screen and its OS (I hate TVs have OSs now) runs like trash. I got what I paid for.

I'd pay a serious premium for a product like an Alexa, but with in house processing of voice. Maybe I heard that the Apple version did this, but I heard enough negative things about it to no care enough to look into it.


> I hate TVs have OSs now

I've been able to avoid this by just hooking up a computer to a big screen TV with HDMI. That's it. I watch stuff via the computer. Slightly klunkier for some stuff but has worked well enough. Might not be the best interface for kids or folks not comfortable with a full keyboard/trackpad though.


They make android devices for the purpose of HDMI-ing to a TV.


What happens if you just disconnect the TV from the internet? Not requiring the TV to connect to the internet is the most important thing to me at this point when shopping, specifically because of ads.


So it's also adware... as if I didn't need another reason not to have an always-on Internet-connected microphone in my house.


Just wait until you see what neuralink turns out to be


The internet really will turn into that episode of futurama


Have you browsed the web without an adblocker? It's been like the Futurama episode since it aired


My April Fools Day blog post of in-brain advertisements https://deumbra.com/2022/04/introducing-thoughtlog-the-free-...


From Elon Musk’s other products, it doesn’t seem he is a fan of the advertising model.

Trapped in a 3 ton metal box for hours seems the perfect time to force feed ads, and yet… nothing


Nothing /yet/


Given Tesla has not done this for years, I think you have to give them credit for not being stupid. Not every organization wants to create an ad infested product.


I think their aggressive anti-repairability it's pretty stupid, so I don't think I will give them blanket credit. I would bet a sum of money that is fairly large to me that by 2034 they will be showing ads in some form of they are not defunct as a company.


I also think they absolutely will, once they get the actual full-self-driving figured out.


don't worry, the industry is getting there!!

https://old.reddit.com/r/Justrolledintotheshop/comments/1c1g...


That is not a tesla.


I had dots throughout the house back in 2017/2018. One day I had an issue and asked Alexa to put me in contact with customer support. She read off a definition of customer support from the dictionary. I asked a few more times until I realized customer support would not be available via Alexa. I unplugged them all and threw them in a bin...where they remain.


The best is a message notification in the bedroom at 2AM, or maybe it’s an update. I’m in the process of replacing it with HA voice with the atom boxes.

It probably won’t be as good, but it’s time to jump ship.



Interesting. I use HA as my hub but haven’t been following closely with what’s possible. I’m going to have to grab one of these Atom Voice boxes and try it out.

I’m increasingly infuriated with the Alex Echos. Specifically the “while you wait..” or “did you know…”. The only thing I trust the Echos to do is turn on/off devices, set timers, and set reminders (same as what I trust Siri for). I imagine with Atom you can probably hook up an LLM somewhere in the chain which would be awesome, especially if I can use a local model.

Another thing I want to banish from my life is “An item on your subscribe and save is not available”, that’s it, that’s the message. What item? Doesn’t matter. What about if you ask? Alexa has no clue what you are talking about and the app is useless for this specific purpose and in general. In fact the app is horrible to work with, it’s slow and buggy. I want to find the developers of both the app and this “feature” and just ask them why they hate users.

Alexa was neat for a while but I’ve replaced the music playing ability with Sonos which does a way better job and now I really wish Sonos had a HA Voice integration. I can stomach setting up these Atom boxes everywhere instead though.


> I think that for this segment to really take off Amazon/Google/Apple all need to start releasing usable stuff to the open source world. A good open source home server voice based back-end that can control open-source devices will start creating apps and drive exploration the space.

That commoditizes the frontend, which is the part they would actually control. There's no lock-in if you can swap your Amazon Echos for Google Homes and everything still works.

It would drive interest in the space, but probably not money into their pocket.

I don't think there's any inherent advantage for Amazon/Google/Apple to make all the homes smart. It's only to their advantage if they're the ones selling the devices.


This was the same reason I got rid of my Alexas; those "by the way..." ads were infuriating. Instead, I just use Siri, and while Siri does get flack, it has yet to advertise to me inane things when I just want to set a timer; it is good enough.


Does this happen with ads turned off?

https://www.amazon.com/b?node=98592480011


It does because Amazon does this insidious thing there they keep adding new categories of what are, effectively, advertisements and sponsored content.

There used to be something like 3-4 checkboxes I turned off on my Alexa Show to disable stuff like that, but they gradually will break out stuff into it's own, separate category that is enabled by default and you have to turn off again. I went through my Show again last month and had to disable more than 20(!) new checkboxes to get it to stop showing me things like "New Prime Shows", "Popular Amazon Buys", "Trending Pop Culture News" and other crud like that.

Every few months, a new category enabled by default and a new checkbox to turn off.


Thanks for the details. Another dark pattern for FTC to monitor, https://www.ftc.gov/news-events/news/press-releases/2022/09/...


> 10 minute timer set, while you wait I can....

Sounds like they enshitified where they eat.


It's not surprising to me. Alexa was entirely botched IMO.

For example, I would ask Alexa "can my dog eat <X>". In the beginning it was not clear to me this was a third party skill. Then one day Alexa would respond with something like "nay, Dr. Wolf says ..." (I don't quite remember). I was surprised because who speaks like that? I then realized it was an Alexa Skill made by a third party developer. The developer later attempted to monetized this skill so you could only ask the question 3 times and after that it would bill you.

Alexa has a lot of other issues. Every room in the house has an Alexa Echo Show. But the screen real estate is entirely wasted with useless stuff. Very limited customization.

I'm in process of replacing my home automation with Lutron products. And the Alexa will soon be gone from my home. It seems like everything Amazon does is half-baked, but enough to get out the door. There is no polish at that company.


> And the Alexa Echo Shows in every room will soon become ewaste.

Please try to donate to a local hacker space, nephew, give away on craiglist, but this just makes me depressed every time. :-(

> It seems like everything Amazon does is half-baked, but enough to get out the door. There is no polish at that company.

Isn't that more and more becoming the norm? Google immediately comes to mind too, and the only exception is probably Apple, but at times even they throw shit on the market that just doesn't feel polished at all. It's just that more often than not they actually try to make it right then, instead of letting it linger...


"Thank you. I will consider donating. I did not mean to imply that I would throw them in the garbage. I meant more that I would take them to my city's electronics recycling drop-off.


Apple HomePods are pretty half-baked with how shitty Siri, they have a very niche set of things they do but even that's not reliable because Siri likes to misinterpret


They work reasonably well for today's weather and music - which is all I want them for. Not once have I gotten an ad from my HomePod which is in and of itself the killer feature.


I used it for grocery lists until a few months ago when they a) started putting ads in for when I was shopping b) discontinued the web interface which was far more convenient for me.

I literally went back to pen and paper because of this endless closing off and injecting ads garbage.


I agree, I had to disable notifications because it kept putting on the yellow ring and then I ask what's up, and it's an ad.

Amazon, if you really want to make money from Alexa, just charge me a couple bucks a month, it's worth it. It was never going to be a profit center.


i know, i know, M$!!1!111!!!! but check out To Do for this and other list making purposes.

shared, synced family grocery list is a must in our household now. yes, items must be added 'manually.' but they can simply be checked/unchecked rather than deleting (which also provides a convenient pantry inventory review just before shopping trips).


Is this satire?


no, it's actually a suggestion for adults that grocery shop for themselves and their families.


That was the killer feature for me. Being able to yell "Alexa, put milk on the list" right as I poured out the last drop was very convenient. For larger lists like recipe ingredients I used the web interface. I still use it via the Windows app (which insists on grouping my items even though I keep turning it off) and my phone, but the experience and interface has gotten worse over time. I wonder if I can get similar functionality from Google.


You definitely can get Google to mark one of your Keep notes as the designated "shopping list" and then add items at the top of it when you ask it to, it works pretty well.


I use AnyList for that. It isn’t very expensive and I can say “Alexa, add milk” and it puts milk on the list in the right category. It’s pretty much the only thing I use my Echo for these days.


Seeing the fast development of AI in the last couple of years, it's incredible how stuck AI assistants are. Basic things like "Hey Google, turn up the volume" don't work.

Just imagine what a good AI assistant could do with todays LLM tech. Actual things I would have used in the last two days:

- Hey Google, what's the song that goes "da da da da da daaa da"? There are skeletons and elephants in the video?

- Hey Siri, can you move all the windows of project XY to a fresh desktop?

- Move all the shopping tabs to a new window.

- Find the last contract draft for customer ABC.


I suspect Amazon has connected Alexa to some sort of LLM in recent weeks because her responses to the insane stream of consciousness questions I ask basically endlessly have become both more certain and more unhinged.

For instance, I recently asked her “How many cigarettes does the average jazz band smoke per night.” and instead of the prior usual “I’m sorry, I can’t answer that” she replied a very confident “15 cigarettes”.

I was honestly really surprised and had to check the log in the app to be sure she had heard what I had said correctly. She had indeed.

Another one I have asked her many times is “How many goats are in a goat boat?” (I really just ask endlessly insane crap every time I am in my kitchen, it’s some sort of addiction) and again where she used to reply the “I’m sorry” response, she now replies with the confidence of an LLM “2 goats”


Many of those have nothing to do with AI, and everything to do with access points and integrations in applications and OS.

But for everything else they do suck.


Giving AI access to OS APIs is trivial for Apple and Google. The problem has been that AI assistant were always too dumb to understand user requests.

LLMs changed everything.


First, it's not as trivial to give access to OS APIs as you think.

Second, giving access to APIs... gives you access to APIs, and that's it. Even the most advanced LLMs are dumb as shit when it comes to anything non-trivial in programming. What makes you think that a modern LLM will manage to translate "move tab X to a different window" into a dozen or more API calls with proper structures filled in, and in the correct order?


It's trivial. For example, most macOS apps can already get these permissions. Apple can grant special permissions for Siri quite easily.

I don't sense that GPT4 is "dumb as shit". I sense that it's extremely capable and very close to changing everything if, for example, macOS completely integrates GPT4.


> It's trivial. For example, most macOS apps can already get these permissions. Apple can grant special permissions for Siri quite easily.

Of course it's not trivial if you think for more than a second about it.

LLMs produce output in exactly three ways:

- text

- images

- video

What you think is trivial is to convert that output into an arbitrary function call for any arbitrary OS-level API. And that is _before_ we start thinking about things like "what do we do with incomplete LLM output" and "what to do when LLM hallucinates".

You can literally try and implement this yourself, today, to see how trivial it is. You already have access to tens of thousands OS APIs, so you can try and implement a very small subset of what you're thinking about.

BTW, if your answer is "but function calls", they are not function calls. They are structured JSON responses that you have to manually convert to actual function calls.


> BTW, if your answer is "but function calls", they are not function calls. They are structured JSON responses that you have to manually convert to actual function calls.

You could in theory send ASTs as JSON.


In theory, yes. Someone still needs to compile them and match them to functions etc.


Mapping JSON responses to OS APIs is the easiest part. It's trivial compared to understanding what users actually want.


> Mapping JSON responses to OS APIs is the easiest part.

Please show me how you will do that for the tens of thousands of OS APIs and data structures.

Edit: because it's not just "a function call" is it? It's often:

   struct1 = set up/get hold of a complex struct 1

   struct 2 = set up/get hold of a complex struct 2

   struct 3 = set up/get hold of a complex struct 3 using one or both of the previous ones

   call some specific function 1

   call some specific function 2 using some or all of the structs above


   free the structs above, often in a specific order


So the question becomes how can this:

   { "function_name": "X", parameters: [...] }
be easily converted to all that?

Don't forget about failure modes. Where you have to check for the validity of some but not all structs passed around.

Repeat that for any combination of any of the 10k+ APIs and structures


If Apple and Google wants to give access to the OS to an LLM such as Siri, they'd make it easy.

It's the trivial part of all this. You're getting too much into the details of what is available to developers today. Instead, you should focus on what Apple and Google would do internally to make things as easy for an LLM as possible.

Up until now, the hardest part was always understanding what exactly users want. GPT4 is a thousand times better than what Siri and Google Assistant are currently doing.

Again, mapping OS APIs is the easiest part. By far.


> If Apple and Google wants to give access to the OS to an LLM such as Siri, they'd make it easy.

You keep skipping the question of how.

> It's the trivial part of all this. You're getting too much into the details of what is available to developers today.

Somehow you have this mystical magical idea of "oh, it's just this small insignificant little thing".

Literally this:

   LLMs understand human requests
   * magic *
   Things happen in the OS
You, yes you already have basically the same access as developers of the OS have. You already have access to tens of thousands of OS APIs and to the LLMs.

And yet we haven't seen a single implementation that does what you want.

> Again, mapping OS APIs is the easiest part. By far.

If it is, it would make it trivially easy how to do this trivial and easy task for a small subset of those APIs, wouldn't it? Can you show me how you would do it?


You should try it. Go Ask chtagpt to generate a program that finds all windows in XYZ project and move them to a 'ew workspace and I guarantee it'll fail. Maybe it'll get it right if you feed it back error messages, but you can't do that in production.


The problem is that Google walked back their integrations for anti-trust reasons, and Apple is now being sued by the US government for having too many apple only integrations.

AI assistants only barely had enough data to understand your requests, and now have even less.


> Basic things like "Hey Google, turn up the volume" don't work.

Just an FYI this type of instruction has worked for about a decade, actually.


Why does it seem like there is so much "fad-like appeal" even in the rungs of upper management at large companies. You see it now with AI. Diverting of resources away from projects that, only a few years ago, were yielding good results and could benefit from a refresh now. As recently as 2022, Amazon was giving away very nice AWS promotional credits for skills, and there are some very good ones. You see it with Google's new buildings. Stopping construction after a bad quarter?


Upper management are just average people with better networking and less empathy. They fall for fads and FOMO just like everyone else.


Generously you could say they're reacting to the "real danger". The danger to them personally is that someone else does better at a currently fancy project and appears more capable than them to others.


> Upper management are just average people with better networking and less empathy

Very concise and to the point. I might print and hang this!


Average is downplaying their IQ IMHO, but I agree with the rest. And at the end, same result in both cases.


I dont know why your up-playing their IQ. there may not be a ton of forrest gump tier dummies, but they're often profoundly average people.

I've done consulting at F500 companies and was consistently not-impressed by director levels and above, dudes in tech for 30 years and didn't have an understanding of what "production" meant. Outsourcing literally their basic day to day job responsibilities to Accenture and McKinsey to the point where pretty much anyone reading HN could have been in their role.

A lot was "flash" -- looking good, speaking grandly, and sticking to broad approaches that they could assign to a technical senior manager or contributor. And a lot of the time, to be honest, that worked: sometimes you just gotta motivate people with big gestures. But once it became time to actually get outside of the box and generate new ideas they were stuffed shirts.


I'd suggest working in the pits of blue collar work for a few years before deciding on what "average intelligence" is.


Alexa never yielded good results. It was a bad business and a bad product. Just like every other voice assistant.

Meanwhile, AI is the only thing that could possibly make a voice assistant product decent. But it will require a complete change in approach and these "skills" made the old way will be irrelevant.


And it will still be a bad "product" because ai + voice will be more expensive than ads can support and less useful than most consumers will pay


I called it out last year [1]; Amazon failed to monitze off of Alexa and it costs them quite something to keep its backend hardware & software up and running.

Going by the rumours the layoff axe fell the hardest on Alexa division. And now we see this news about 3rd party apps. Next Alexa will be put on life-support and then stop it completely 3-5 years from now. It's good that they do have bluetooth and audio cable interface so the hardware won't be bricked.

[1] https://news.ycombinator.com/item?id=35042440


I don't see amazon killing off Alexa given how incredible it could become with LLMs taking off.


Am I alone in thinking assistants were useful when they had a constrained set of operations that fits our heads? Now Google Maps tells Spotify to play a song called "Whats the name of this road?"


It's one of my great fascinations that assistants worked better in 2010 in my opinion than they do now. I remember setting timers, adding reminders and asking the weather back then and it was amazing, it could interpret very flexibly worded versions of my requests almost perfectly every time ("Add a reminder a week from now" etc). But steadily over the years this functionality seems to have degraded. It's got more and more hit and miss and finally I've just stopped using it. I don't really understand why ... I assume it's happened as they've tried to make the systems more flexible and accommodate wider inputs (different accents etc), it's made them worse at the more constrained use cases.


You're not alone. Maybe not 2010 though, hah.

2015-2017 was the golden age at least of Google assistant (or was it still Google Now?). Fast, responsive, and accurate.

Now in 2024 it's grindingly slow and unreliable.


I believe the hope is the nuanced language understanding of LLM could correctly parse "What's the name of this road?" and know it's not a request to pass to Spotify.


The issue with Alexa is no one successfully monetized it. How will LLMs improve monetization?


The article mentioned it: Alexa Pro or something which is a subscription to use the LLM-powered Alexa.


It's highly unlikely that a typical end consumer will pay subscription fee just to have a conversation with a device. Not saying there won’t be niche domains where this is useful but it won't reach mass adoption to justify the costs to keep Alexa up and running. Maybe they can charge for LLM + Smart home features, perhaps there could be more takers?

Either way they need to figure it out soon because the cost will keep accumulating and they can write them off only for so many quarters.

It seems they had sold 500 million Alexa devices until last year [1]. Just the hardware infra to keep servicing them is massive; not to speak of developer cost.

[1] https://finance.yahoo.com/news/amazon-has-sold-more-than-500...


I think people are getting a little weary with paying a company for physical hardware, paying a subscription fee for SAAS, paying with their private data being harvested and sold off to third parties and still getting served ads.

It just starts to feel egregious.


I wish people cared about this more. But from my own personal sampling of average non-tech people in my life, they kind of just accept it without flinching.

Drives me up the wall that people pay monthly for an otherwise static service just to keep the hardware functioning.


You should sample younger. There aren't nearly as many 'non-tech people' 16-25.


If amazon launched an LLM similar to chatgpt i guess I could pay for it and then access in alexa would be a bonus. That could make me choose amazon instead of openai, google etc.


I've only seen people use alexa to control the music player. Why do you thing they would be keen on paying for pro features while all they wanted since the very beginning was just not having to handle a remote or grab their phone to play music?


It always had tremendous potential and never went anywhere.

LLMs require either more local power (=way more expensive devices) or more server power, which someone has to pay. Power users aren't paying, as they;ll get better results through their computer or phone to access the LLM of their choice. Casual users aren't paying either, as what they want to do is too low stakes to warrant higher costs.

Amazon will probably never completely kill Alexa, but I don't see them investing much more to advance the product in any significant ways.


I just read an article about a 19-year old making $10,000/month from the Alexa developer incentive program. It's sad that this will be ripped out from him and other developers in a few months.

I stopped trusting Amazon years ago when they banned my seller account and my business was shutdown overnight. After 15 years, I finally got my account back with no explanation. At this point, I no longer care and have moved on.


Its unconscionable. The least Amazon could do for this poor kid is to at least write a check of $5.5M that would be roughly the amount they would make if they keep doing same thing till age of 65.


> 19-year old making $10,000/month from the Alexa developer incentive program. It's sad that this will be ripped out from him and other developers in a few months.

"Sad" feels a little strong here


Does it? I’d be inconsolable if I was that 19 year old or his girlfriend.


Based on the article the dude seems like he'd be less beat up about it than the comments here are on his behalf. He got 100k+ during his college years and it helped him land his first full time job.

Hell, let most's problems be that a developer incentive program didn't live up to be a lifetime career replacement and I doubt they'll consider it an upset at all.


It was never a sustainable business. Amazon was burning cash in the hopes of somehow making Alexa profitable, but they have now realized that won't happen and are turning the faucet off. People should be happy that they were able to get in and make some cash while they could. What would you have Amazon do instead? Make these charity payments forever?


it's subsidy, personally feel like that shouldn't be your model for sustaining that is barely amazon's fault here, they did their part but there doesnt seem to be market around it


The same exact thing happened to me. How did you find out you got it back? Did they send you an email or something?


> It's sad that this will be ripped out from him and other developers in a few months.

Diversify. Never rely on a single large monopolist for your only revenue stream.


I get that Alexa skills weren't successful and aren't getting a lot of usage, but I am always surprised that these services are cited as being very expensive to maintain. For Amazon, what's the realistic cost in hosting these? Or is it that they have teams of people covering the security / oversight etc?

At this juncture, I have to be sceptical that the reason they now want to kill it is because they effectively want to Sherlock them all by shipping LLM based AI functionality instead, and they want to ensure that nobody else does it on their platform before they can.


In my experience there are many reasonable incentives that shut these things down. Execs don’t want the even small maintenance burden on already stretched-thin dev teams (low ROI and bad opportunity costs), and individual devs want to work on the most impactful projects.


Apologies if this is OT.

For a very long time (probably more than a year) I have been seeing the same link at the bottom of every single Ars Technica article; it is a link to a video titled "SITREP: F-16 replacement search a signal of F-35 fail?"

Are other readers experiencing the same? If not, what could be the reason for targeting me? I am not involved in military tech or aviation, and I usually block tracking cookies so I don't expect the site to know much about who I am or what my interests are. I just find it odd that the footer of a publication like Ars Technica would remain unchanged for such a long time.


I see it too. Seems to just be an article they like. Perhaps they condition advertisements on referrers.

Edit: I see now it's a video, meaning it's really just a flashy headline that gets you to click and immediately watch though an ad. They probably make quite a pretty penny off of that.


Always the same discussion about the TV series Quantum Leap, here.


oh yes that's right I see that as well, I notice it less because of the more benign topic. I just clicked on a few links from the home page and it seems to depend on the section the article is on (science, tech, gaming etc.), in the "tech" and "gadgets" sections I get the military plane video, when the article is categorised as "gaming" or "information technology" I get the Quantum Leap video, and when I'm reading articles in the "science" category I get a third video about Biomarkers in medical diagnosis.


i wouldn't be surprised if the feature was just broken, ars website works really poorly


It probably converts well and they see no reason to update it ... or the intern turned off the cron job that was written in 2015 that is supposed to update the cross-links every day ...


Yeah I see the same and have thought it strange.

I think it’s just that footer was designed to promote their video content and Ars doesn’t produce much video so that’s the most recent thing to put there.


I've been waiting for a while for the tech to catch up, but it should be more or less possible now to use Whisper and an LLM to run a virtual assistant locally. Virtual assistants always seemed like a privacy nightmare to me so I never pulled the trigger.

But if it's all local? It would be a great product. No internet unless I turn on a physical switch, no ads, no monthly fee. Just do the thing I want you to do. Maybe at the moment it will still need to talk to my GPU. But we're getting there.


You can duct tape together Whisper and GPT4All in a couple dozen lines of Python.

It wasn't bad; I'm on a 3070 so it was slow, but tolerable (a couple seconds of latency up to 5 or so seconds for the full Whisper+Mistral7B pipeline).

I named mine Jeeves, so at any point I could just "Ask Jeeves" (lol) and it would talk back to me. The TTS spoke slower than I would prefer, but I probably could have fixed it. I also should have prepended the prompt with something to encourage it to be brief. It often started reading out a couple of paragraphs of text.


Unfortunately you’d probably want a subscription to newer models that are trained on more relevant and new data. Basically the difference between a standalone GPS unit and an internet connected maps app - at first it doesn’t seem like you need anything new but slowly the world around it changes.

But yea this seems like a great vision generally. I think that we’ll progress past a chat LLM to one that autogenerates UIs - several companies have already demoed this.


I was quite bought in into the Alexa eco-system - but that changed over the past. I removed and sold every screen Alexa device I had in my house since it was such a distraction and did not simplify my day at all or bring value. Also the non screen devices really degraded in quality. I have the feeling that the speech reco got worse and also asking questions the answers I get are now more often miss than hit. Sometimes I ask something specific for countries and the moment it starts with „this is something I found and translated…“ I know the answer is not going to be relevant.

The only reason I didn’t get rid of it is, that it’s an easy way for my kids to play their music. Otherwise I would have ditched them all


I've never seen any value for me in the Amazon spy devices. They listen to your home and play ads. Amazon profits, not me. For me, device has less utility than an electric can opener.


I wrote Alexa skills for personal, household use, but it was hard to find use cases that audio was faster than scanning a list of things like others had mentioned.

My wife and I are NASCAR fans so I did have a skill where you could just ask "what's the weather at the track" and, based on today's date it would know where the next race was and give back weather for that location. But beyond the things mentioned here, other places have scaled back their free APIs for things like weather and other useful services.

What really killed the interest was the nagging feeling that we had that Alexa was listening more than it should, so we keep it off most of the time so you have to go to it, have it listen, ask your question then turn listening back off.


I am willing to pay Amazon if it can give me an Echo Dot that can talk to me like ChatGPT can.



It's not hard to make an Alexa skill that hooks up to an OpenAI API (or compatible) endpoint. This repo is a good starting point: https://github.com/k4l1sh/alexa-gpt


> The news has left dozens of Alexa Skills developers wondering if they have a future with Alexa, especially as Amazon preps a generative AI

It's here now, they just haven't announced it. The Alexa devices at my home can answer arbitrary questions. "What is the round shape on top of the world trade center?" "That is the one world observatory" etc.

There's a weird thing where some questions get picked up by the non-LLM. "how much will it rain today?" gets answered by the old weather app with a nonsensical answer like "it will rain today at 4PM"


Normal people don't want to have a conversation with their Alexa device. They want simple, reliable commands to do the common things in their household. Anyone who's tried to order something off Amazon through a screenless Echo knows the pain.

Skills were always dead in the water the minute they required an AWS account and a Lambda deployment to get going. That's a miserable thing to develop (and an AWS account itself being a looming financial liability) and maintain even if you're familiar with all the inner junctions of Lambda and IAM.


Isn’t it just like Apple. You host the app on your own servers right? And you need a developer account to do all of that.


I have a question. Why has it taken this long for a replacement for Alexa to show up? All we need is something hooked up to, say, ChatGPT that lets me control my house stuff, set timers, add to a shopping list etc... And it can generate JSON for all that which can then get picked up and relayed over the wire.


You can semi-DIY it already with Rhasspy + your choice of home automation stack (HomeAssistant or what-have-you). Rhasspy is probably what you're looking for.

https://github.com/rhasspy/rhasspy

Was discussed last year: https://news.ycombinator.com/item?id=33705938

My only real issue in setting it up and running it has been finding appropriate microphones.


You throw audio into an LLM and get a function call back. Okay.

An API call to OpenAI whisper costs around 0.01ct per minute, so really cheap. Feeding the text-from-speech and the possible function calls with their JSON schema into the LLM costs another penny. Everything works.

But now, where's your business model?


Alexa is perfectly fine for that even without ChatGPT-like capabilities. The Skills situation might be a disaster, but if you just want to control devices in your home, it's great. The level of integration with pretty much anything is amazing.


I would really like it to be far more intelligent. For example, if I give it a request like “decorate all the lights in my house with Christmas colors“, it should know what to do.


Other than Siri and Google Home?


ChatGPT would be terrible at adding stuff to a shopping list.


What makes you think that? It’s all about the prompting.


But entertaining for a few minutes!


Contrary to the article, I'd say Amazon does have some incentive, and definitely the money to keep this going as long as they want. What's missing is mindshare, because voice assistants are fundamentally uninteresting beyond queries and basic automation.


From everything I’ve heard, Alexa was an absolutely massive boondoggle inside Amazon, with literally thousands of engineers and hundreds of managers. It’s incredible to think that so many of them spent years there, growing through the ranks of Amazon, without ever building anything resembling a real business. This really shows exactly how unfair the playing field is for anyone competing with big tech - there’s no pressure for them to make money for years at a time, when everyone else has to actually fight to survive.


I use Alexa exclusively as a voice control feature for my Home Assistant server. If they allowed Alexa to process commands locally, the product would be a lot more attractive.


Makes sense. I'm sure they're working on a next gen Alexa based on LLM.


It is sad. I like alexa and wish it would get more love to become more useful.


Alexa's platform is really, really bad in 4 dimensions: it's antiquated with limited fixed action patterns, it just doesn't work correctly, its skills aren't adequately maintained, and it's developer hostile.

On the first part, if a skill doesn't support a precise sentence structure, it becomes difficult to use or simply doesn't support features users expect to use using natural language.

On the second point, not all Alexa devices work with IPv6. I have an Echo Show 15 here that simply refuses to play BBC News and other "Flash Briefing" items when IPv6 is enabled. Really? It's 2024 now. Did the whole factory reset, reboot, and so forth dances.

An LLM startup with enough partner biz dev/sales talent could combine IoT API integrations from enough manufacturers to side-step the boring and limited Apple Siri, Google Hello, and Amazon Alexa oligopoly hegemonies and self-destructive platforms like IFTTT to execute on a compelling, competing replacement making good on all 4 points. However, I don't think giving away computing resources for free is necessarily a scalable business model but it's fair to charge a reasonable micropayment subscription to host and run your code and data.


>"Dozens" may sound like a dig at Alexa's ecosystem, but it's an estimation based on a podcast from Skills developers Mark Tucker and Allen Firstenberg, who, in a recent podcast, agreed that "dozens" of third-party devs were contemplating if it's still worthwhile to develop Alexa skills. The casual summary wasn't stated as a hard fact or confirmed by Amazon but, rather, seemed like a rough and quick estimation based on the developers' familiarity with the Skills community. But with such minimal interest and money associated with Skills, dozens isn't an implausible figure either.

lol ya, I thought the title was a joke, but apparently their best guess is that there were dozens of developers. This is like a punchline from XKCD.


“dozens”

lol, is this a burn?


Answered in 3rd paragraph of TFA:

> The news has left dozens of Alexa Skills developers wondering if they have a future with Alexa, especially as Amazon preps a generative AI and subscription-based version of Alexa. "Dozens" may sound like a dig at Alexa's ecosystem, but it's an estimation based on a podcast from Skills developers Mark Tucker and Allen Firstenberg, who, in a recent podcast, agreed that "dozens" of third-party devs were contemplating if it's still worthwhile to develop Alexa skills


I think the author may also have been making an Arrested Development reference. In which case, bravo Scharon!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: