Hacker News new | past | comments | ask | show | jobs | submit login
Amazon’s Echo Brims with Groundbreaking Promise (nytimes.com)
112 points by ilamont on March 9, 2016 | hide | past | favorite | 91 comments



Writing a bit about the Echo has been on my todo list but this article touches on a lot of points I was thinking about. I got it thinking it would be a bit of a gimmick—and to an extent, it still is at times. But, as the author found, it has snuck into facets of my life. Shopping lists, weather, home automation, etc.

The thing is, it's a natural interface with a decent interaction model. Keyboards, mice, etc, have all become second nature to most of us but they're still not truly natural. The Echo feels like one of the first voice-based interfaces that isn't a complete gimmick. And the hardware does an amazing job of perking up when I say the wake word and picking me out of a room full of noise.

What really convinced me of it's value was when I started digging in to writing a skill for it. My grandfather passed recently and my grandmother has horrible eyesight. I wrote a skill that enables my grandmother to ask to start a phone call. She is then asked who she would like to call. Once she says a name, she receives a call on her landline, is greeted, and the call is forwarded to the intended recipient. There was a small learning curve ("you can't just say, 'call John,' you have to ask her to 'start a phone call'") but it has worked and has provided an often confused elderly woman with poor eyesight a straightforward way of connecting with her loved ones.


That's a touching use of Echo and wouldn't it be even easier on iPhone or Android? "Ok Moto X, call John." Instantly connected.

I have an Echo and I love it. The skills SDK is one of the best parts about it. The most common arguments against Echo though are that it's just your phone's voice service, but with less features and in a fixed location.


In this case, the fixed location plus the far field mic are great. When she's alone, she spends all of her time in her home. She lives in a pretty small place which means she can basically ask to start a phone call from anywhere and she'll be able to interact with the Echo. It's as much the hardware as it is the platform that makes it all work in this case.

I have no doubts the tech could facilitate something similar on Google and Apple's platforms but, afaik, building comparable functionality isn't currently possible (at least with Siri)—can you add arbitrary commands to either?

EDIT: the quickest interaction I could get away with was, "Alexa, start a phone call with {name}". No back and forth. One statement and the echo simply confirms (which was optional).


She can initiate the call from anywhere, and based on the description given she still needs to go over to and pick up her phone to complete the interaction.

I agree the ability to add arbitrary commands into Echo is a great feature, reminds me of ifttt quite a bit, which is usually a bit clunky yet very powerful.


Yeah, that's one quirk I wish didn't exist. If I could make the Echo work as a speakerphone of sorts, that'd be amazing. That's one of my two asks (ability to get access to the microphone), the other being server-side interaction initiation but I recognize the massive potential for abuse of both so I'm not holding my breath.


Do we even know if grandma has a smartphone?


For an elderly woman with poor eyesight, a fixed location landline that never needs updating or maintenance is probably preferred over an easily lost phone that needs to get plugged in, updated, etc.


As a grandson of two, nope. When one of them fell down, having a cellphone in her pocket allowed her to call for help, which she couldn't have otherwise. She still has a landline, but it has become an expensive decorative object.


> expensive

I pay ~$0.50/day for the peace of mind that I can call the police from any major room in my house, whether or not my cell phone is near me, or even has a charged battery for that matter. Also, that emergency services would immediately know my address when calling from my landline, so I could do as little as just call and hang up to get them there in an emergency. There are many benefits to maintaining a landline, and (cheap) cost is one of them.


>I pay ~$0.50/day for the peace of mind that I can call the police from any major room in my house,

You know you can cancel your phone service and still call 911 in many states (including the most populous ones)?


I did not know that, but I will certainly check into it. Thank you.


You can also do this with Cell phone that works. I don't even think the phone has to have a working SIM card in it to call 911.


We're hungry for better design, and unfortunately technologically complex products are where the progress is being made. Case in point, Nest. So we take the bad with the good.

But yeah, in this case, there are big-button landline phones with speed dial.


True and I totally agree that the simplest solution is generally the best and that not ever "problem" needs a technologically complex solution. Nothing worse than entering a "smart home" and being told not to use the light switches[^1]. :)

In this particular case, we've tried a big button phone. The problems with it are two fold: first, cataracts impact her straight forward vision and she's not able to remember button placement. Second, it's not uncommon for her to forget what she intended to do before getting there. The simple act of walking to the phone to initiate a call is enough time for her to forget who she wanted to call in the first place. At least in this case, she can act on the thought immediately and be reminded when she gets to the ringing phone.

[^1]: full disclosure: I can toggle my lights with my Echo. I use z-wave light switches though so people that aren't nerds like me don't have to personify the appliances in my apartment.


And yet, sisk's grandmother prefers this method. Even removing a small amount of friction can have big changes in how people behave with technology.


    > Shopping lists, weather, home automation
Funny, I felt the article gave unusually short shrift to Siri, which I use for all of those things, in addition to "What time is it in X?", alarms, and so on.

Siri is not working so well when music plays, and I wish I could easily add skills to it, so if they've cracked that then it's something.


Those are the two biggest differences, for sure.

All the big players have done a good job of speech to text (at least for USA English—saw an Englishman have to break into his best John Wayne impression for an on-stage demonstration about a month ago). But the Echo hardware does a great job of isolating your voice and the platform allows us to write skills for it quite easily.

I wouldn't be surprised to see the other guys make steps in a similar direction if the Echo continues to succeed.


It's impressive how some Google services are worse for other languages. It is better understanding my poor spoken English than my native language. Even the simple dictionary doesn't a lot of basic conjugations that doesn't exist in English, like the for the pronoun "We". These subtle differences in technology are a great lever for English speaking countries.


There can be music playing and I can shout across the room to my Echo to add something to a shopping list and it gets it right most of the time. I use an old iPhone to play music on my stereo system. I'm lucky if I can get "Hey Siri, STOP" to work when music is playing even if I walk right up to it.

The Echo is still pretty "dumb" as soon as you deviate from fairly precise commands that it understands but the voice recognition itself is far and away the best I've seen.

The platform aspect is interesting even if I find it mostly still pretty gimicky. But it's got potential.


I haven't used an iPhone in a while, does Siri have a wake word? Does it work when the phone is in standby? Android has "Ok Google" but it doesn't work when in standby and doesn't work from the distance that Echo does.

I think that's really its killer feature. You are in the pantry and notice you're out of basil, you say "Alexa, add basil to my shopping list" and it's done. No pulling out your phone, no unlock screens, you just say what you want to happen and it works.


As of the latest iPhone 6, you don't even need to have the iPhone plugged. You trained Siri a few times by saying a few sentences and later you can say "hey Siri" and it will start listening. Apple added some new chip which will be listening for that key sentence and will only react to your voice, not someone else. So no more funny joke from co-worker nor Podcast and in addition low battery impact.


As much as I want to like this feature, I've found it less than impressive. It rarely picks me up on the first try, so I find myself shouting a very clearly-enunciated "HEY SIRI" a few times, which kind of kills the magic. About 50% of the time I just end up walking over to the phone and using my hands to accomplish whatever I was trying to do. I'm not trying this from far away or in a noisy environment, either. Admittedly an anecdote, but still frustrating.


If it's plugged in, you can enable an iPhone to respond to "Hey Siri" as a wake phrase.

[EDIT: I have an iPhone 6. Apparently the latest models don't need to be plugged in.]


It does ("Hey Siri"), but it doesn't work from far away or when it's in your pocket. Echo definitely has the edge in that respect.


"Hey Siri" on iPhone 6s according to http://apple.stackexchange.com/questions/206240/why-is-alway...

Haven't tried it myself, so can't comment on the range.


Prior to the 6S the phone had to be plugged in for the wake word ("Hey Siri") to work, but as of the iPhone 6S you can do it when running from battery. As you say, the microphone is not particular good at distance.


That sounds fantastic. We got my grandfather an Echo for the same reason. How can I get a hold of that skill for him?


It's currently unpublished because there is an associated cost (Twilio for making the call) and it's not OSS because 1) needs cleaning, 2) was thinking of building a public service out of it.

That said, that's enough of a kick for me to actually clean it up and publish some stuff. I'll email you when I throw pieces of it online (in the next day or two). At the very least, I wasn't able to find a lib that performs all the steps Amazon asks of you to validate the request so I wrote a utility library and hapi auth plugin for that. I'll release those.

Made a quick demonstration video: https://youtu.be/TkVnhp4F1xk


What a great idea.I would like to do this for my grandmother too, but she doesn't speak english. It would probably take another decade before devices like these could understand other languages by then whole world would start speaking english anyhow.


Surprised that no one in this thread brought up any concern about putting an always-listening internet-connected device into their home... That is connected to other things, like phones and even electrical devices, like garage openers.

My wife likes to glue a sticky-note over the camera on my iMac.


This is my concern as well.

My privacy threshold is pretty low (my username across most of the web is my real name, for example). This, though, is an always-listening device in your home, that connects directly to a company who's stated goal is to sell you more things. That, combined with the potential for abuse/hacks/subpoenas, crosses a line for me.

It feels a lot like the GPS trackers that insurance companies want to put in your car, which I also think is a bridge too far.


Echo only transmits the audio command to Amazon after you invoke the wake word "Alexa." It's "always-listening" but not "always-recording/transmitting." You can easily verify this by looking at your network activity.


... and there's absolutely no way it could ever be changed without your knowledge or consent to do anything other than described!

Right?


It seems to be some sort of not-quite-hardware thing where they download a neural net, based on what I remember hearing about the implementation and that it takes ~10min and flashes a lot of warnings when you switch from "Alexa" to "Amazon" as the available watchwords.

Not that this is completely secure, of course, but it does suggest it'd be non-trivial, at least.


Couldn't the same be said about the phone in your pocket as well?


It could, yes. Perhaps everyone could be a little more suspicious of phones as well.


I feel uncomfortable with the always-listening devices, too. At the same time I wonder if it's any different than carrying a device with a microphone in your pocket at all times. Is Echo more likely to become a privacy invasion than your cell phone is?


I suspect no one else mentioned it because it comes up in every single thread about Echo, and the exact same conversation plays out each time.


Ditto, although if you own a mobile phone you already have one of those - pretty much anything can be put into passive listen mode remotely, and you can be eavesdropped upon.


I don't understand why it matters whether it is "always listening"... any device with a microphone and an internet connection has the same attack surface. To me there's no difference between a device that says "always listening" on the box and one that simply has the latent capability to do so.


If you have a cellphone, I'm pretty sure that it can do everything that Echo can do with the proper app...


> the F.B.I.’s battle with Apple over encryption should prompt deep questions about a future of Internet-connected devices spread around our homes. Amazon has strong privacy protections in the Echo. It doesn’t stream anything without the wake word and it has a physical mute button that electrically disconnects the microphone but, as with all groundbreaking technology, there is no doubt we are entering new territory here.

At least there's minimal thought given to this issue...even if it is at the end of an article claiming the Echo is the next best thing since anything.


I bought an Echo a couple of weeks ago, and I love controlling my Philips Hue lights with it. If I'm on the couch watching a movie, I can just tell Alexa in the kitchen to dim or turn off the living room lights. Setting a timer and having the lights blink when it goes off is neat too. My kids love asking it questions and playing music.

Right now that's pretty much all I use it for... I always forget the specific commands to order an Uber, pizza, etc. Especially the IFTTT recipes (that require you to say "trigger" first), or any of the downloadable "skills" -- there's a specific syntax you have to use. I wish there were multiple, more natural phrases for these actions.

"Alexa, where's my phone?" would be much more memorable than "Alexa, trigger: find my phone."


That's sort of the problem with voice interfaces. Unless it's smart enough to actually have a degree of "understanding" about what you're trying to do--as opposed to largely responding to a scripted command--there's a real issue around discoverability and feedback that makes it less useful than one might hope.

As was being discussed last week or so in some thread, it's a lot more like a CLI (without any man pages or help available) than a GUI.


Maybe it was mostly smoke and mirrors (I never actually used it), but for some reason I feel like we still have reached the level of SHRDLU, which was written in the 60s on a PDP-6! Just this excerpt sounds miles ahead of Siri et all:

    Person: DOES A STEEPLE 
    Computer (interrupting): SORRY, I DON'T KNOW THE WORD "STEEPLE". 
    Person: A "STEEPLE" IS A STACK WHICH CONTAINS TWO GREEN CUBES AND A PYRAMID. 
    Computer: I UNDERSTAND. 
    Person: ARE THERE ANY STEEPLES NOW? 
    Computer: NO. 
    Person: BUILD ONE. 
    Computer: OK. (does it) 
https://en.wikipedia.org/wiki/SHRDLU

http://hci.stanford.edu/winograd/shrdlu/


I assume there was also some connection between this and the parser in the Infocom games (which was written in MDL, essentially a variant of LISP and also came out of the MIT AI lab a decade+ later).

They're all pretty constrained environments though. I used to play the Infocom games quite a bit (and was even a tester at one point) and they were great but you'd also frequently run into situations where you went through a lot of trial and error to hit the right magical incantation.

This was around the same period that Lotus unveiled Hal to great fanfare. It was an add-on to Lotus 1-2-3 that was supposed to let you interact with 1-2-3 more naturally. It was actually good for some things but it wasn't any kind of breakthrough. I think because of GUIs among other things, a lot of the focus on interacting with computers in natural language went away. It's only with raw speech recognition finally becoming generally usable that it seems to be getting some real attention again.


Totally agree. I just think a possible way around this would be to have multiple phrases for an action.

"Alexa, ask Uber to request a ride." is the current phrase for Uber. But if you haven't ordered an Uber via Alexa in a while, the ability to guess an alternative command would be handy.

"Alexa, order an Uber." "Alexa, request an Uber." "Alexa, UBER!"


The big reason behind the "ask X for Y" is namespacing. Otherwise it's who gets there first for a lot of things. This also allows them to sell "namespace free" triggers to corporate sponsors (I don't know if they do, but it makes sense).


> I bought an Echo a couple of weeks ago, and I love controlling my Philips Hue lights with it. If I'm on the couch watching a movie, I can just tell Alexa in the kitchen to dim or turn off the living room lights. Setting a timer and having the lights blink when it goes off is neat too. My kids love asking it questions and playing music.

Every time I hear about the Echo I can't help but think it's a solution looking for a problem. A light switch costs $0 and really isn't any effort to use. If I'm sitting on the cough, I'll... get up, turn the lights off, and sit back down. I can't imagine spending $180 on a glorified light switch, weather app, and a kitchen timer. To each their own, I suppose.


The single funniest post I've ever read on HN was this response to the original Echo promo video:

https://news.ycombinator.com/item?id=8569219

Original promo video here: https://www.youtube.com/watch?v=KkOCeAtKHIc

That aside, I like the idea of the Echo, just not the idea of giving Amazon permanent eavesdropping rights in my home. Any home automation enthusiasts know if there's open source software that provides similar services, but which I could run myself?


Short answer: no. Long anwer: You can cobble something together, using either https://wit.ai (sending your voice to the cloud) or CMU Sphinx a.k.a. PocketSphinx (https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/) (keeping it all in-house)


I built something like this at home. CMU Sphinx is a popular open source speech recognition toolkit that you can run locally. But it was kind of clumsy to setup and train. Also, I couldn't get high accuracy.

Other cloud options include Google Voice and wit.ai. Google Voice AFAIK doesn't allow you to train it with custom models. I finally went with wit.ai which turned out to be pretty good.

Blog post with code and schematics describing my setup: http://arunpn.com/projects/voice-activated-light-switch/


That post reminds me very much of the story "The Twonky" by Lewis Padgett.

http://my.fit.edu/~rosiene/twonky.pdf


“Early on in the product, to play music took eight or nine seconds, and it’s just unusable when it’s like that,” Mr. Limp said. “Now it’s often 1,000 milliseconds

If only there was another way to say 1,000 milliseconds...


10e6 microseconds? ;)

The quote came from Dave Limp, Amazon’s senior vice president for devices. I feel that people in senior positions such as the person quoted are far more focused on metrics and tangible results (e.g. "reduced Q4 defect ppm by X%") so it probably becomes second nature to alter the phrases used favorably. To a passive listener or someone skimming the article, milliseconds can seem far better.

Besides, I wouldn't be surprised if a lot of people need to think for a second before realizing 1000 milliseconds = 1 second.


aka, marketing-speak


I think you're making him look unnecessarily bad by not giving the full quote: "Now it’s often 1,000 milliseconds or 1,200 milliseconds."


It annoys me often in articles that they switch between units in the same sentence. I often read things like "Less than 40% of people did this but an amazing two fifths of participants did that". Most people have no idea how to compare the two. I bet a lot of people don't know whether a 1000 milliseconds is more or less than eight seconds.


Echo is set apart from other voice platforms in that voice is the only way you are supposed to interact with it. The vast majority of things that Siri or Google Now can do, can also be done through the touch screen, and it often seems strange to talk to your phone.

Echo doesn't have a screen so you have to talk to it. It responds quickly and more naturally. Part of its contribution may be making it more acceptable to have conversations with machines.


and here is the open-source version: https://mycroft.ai/projects/ with kickstarter project: https://www.kickstarter.com/projects/aiforeveryone/mycroft-a...


What open source ASR do they use? Kaldi? Do they have recommendations for far field mic arrays?


How much does it cost to buy a New York Times article these days?


You're talking abt the same nytimes that was critical of Amazon's work culture as recently as a few months back.


Probably less than the entire Washington Post.


It's even less to buy/coopt/promise a Blue Origin ride to a single tech columnist who is lightly edited.


TIL that Jeff Bezos owns the Washington Post.


I know my comment did not add too much to the conversation but it shouldn't have gotten so many negative votes. We're talking about a CEO who bought a newspaper, the definition of buying media.


So your implication here is that the New York Times was excited help out a competitor? And so they got a prominent tech writer, one who has written for a variety of publications over the years, to write a puff piece because he'll just write anything for money?


It's Farhad, so nothing


Well, if you are blind, you can use any computer. If you are mute, dont speak fluent English or lets say have bad accent or like, how you use Echo, Siri, et al? What is the alternative?


Why wouldn't you just use your native language?


Not supported. Probably never will.


How does any blind person use any computer?


So where do you put it... in your bedroom? Or kitchen? One per room?

It seems like a smartphone app with a host of "skills" plugins would render this obsolete.


Mine is in my kitchen/eating area and I imagine that's a common location. I could certainly imagine having one in my bedroom as well.

Smartphones are certainly a significant competition to this. One difference is that, today, Alexa's voice recognition is a lot better--especially at a distance in a noisy environment. I don't always have my smartphone at hand especially when I'm doing things like cooking.

It also works as a music source to play over internal or external speakers in a way that your smartphone doesn't. Of course, that assumes you're using Amazon Music.

Of course, there's nothing that says it has to be either/or.


How does it work if you have multiple people in the same house with different Amazon accounts? Is there any sort of voice authentication?


You can apparently have multiple profiles but they're all connected to a single Amazon account. (So you can have different playlists etc. but they're all connected to a single credit card.)

https://www.amazon.com/gp/help/customer/display.html?nodeId=...


Echo is a great utility. I had mine for just a week, and I already feel attached. However, I feel it needs more AI. For example, I can ask it "What is the temperature today?", and I get a straight forward answer. But I cannot ask "How is the temperature today compared to yesterday?".


“It’s an unusual thing for Amazon to be out of something, but for the Echo, it’s usually in stock just for a couple weeks before it goes out of stock for a few days — so it feels like they’re having trouble making enough of the devices.” Mr. Wingo noted the Echo, which Amazon sells for $180 only through its own site, was selling for $200 to $300 on eBay.

FYI, for anyone that does want to buy one right now (Amazon says it will be in stock March 20), I bought mine from Home Depot. You can also supposedly get them at Sears, Staples, and a few other stores. No reason to spend $200+.


I just don't like that the Echo is always listening and waiting for you to address it.

Maybe is there was a smart phone app or a clicker I could press to bring it to attention then I'd be more interested.


You can mute the microphone and use the echo remote control instead.


That is a thing you can do - the Echo comes with a remote, so you can mute the main mic, and speak into the remote if you wish.


I can definitely see voice being the most common way of interacting with computers at home. I think it was pretty smart of Amazon to focus their attention there first.

I rarely use Siri because:

1) If I'm in public it's awkward. I don't want to disturb others, and don't want strangers to hear even my boring interactions with Siri. Plus it still has "the Segway problem".

2) If I'm at home I often leave my phone in another room.

I don't have an Echo yet, but I ordered a Dot (already have plenty of speakers at home) and I'm excited.


Especially for home automation, I still find it so much easier to press a button inside an app than speak words. Just like pressing 1, 2, 3 within an automated 1-800 prompt is heaps easier than trying to enunciate out phrases like "customer support." And so I could never warm up to Siri, Echo, etc. They take way too much effort, even if voice recognition was perfect.

Am I alone?


You're not alone. I simply don't like talking to computers, and don't have time or interest in figuring out how to coax them into doing what I want. Whenever I get one of those customer support lines that demand me to "Tell me what you want!" I just jam on the "0" button over and over until a real person shows up.


If I have my phone in hand and it was already open to a number dial pad then maybe it's easier to press 1,2,3. But often, even when I"m already in call, I have to (1) press a side button to turn screen on, then (2) look to find button to press to bring up dial pad, before (3) I look to press 1,2,3. I hardly think saying "customer support" (without diverting my eyes or attention) instead is "way too much effort" in comparison. In fact, I just used similar voice command in automated call ten minutes ago because it felt like _less_ effort than bringing up the dial pad. With home automation I expect it would be even stronger in favor of voice interface, because I would almost never have a phone in hand with home automation app in foreground with display on.


Its worth a try. Remarkably appealing, at least for the first week or so. Easy to say "never" without having given it a chance.


"Amazon’s open-platform strategy for the Echo" - what a lie.

As much as I admire Amazon, please don't call things what they aren't. AWS is not an open platform. Echo is not either.


Why does this ridiculous ptoduct exist? Why is anybody buying it for ~$200? Why can't it be replaced with a phone app?


You could, with an app like Autovoice.

Often you don't want to whip your phone out though to speak to it, and the phone mics are bad at capturing far field speech - so need a far field always-on mic.

Haven't seen any bluetooth array mics so far - seems like a market opportunity


As long as it doesn't run free software it will be almost impossible for me to trust such a device.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: