> could be remotely hijacked, using carefully crafted near-ultrasonic sounds, and forced to make unwanted phone calls and money transfers, disable alarm systems, or unlock doors.
I know Siri is much too dumb to facilitate a money transfer…
Getting Siri to actually do what you want it by standing close by and shouting tends not to work - not sure how some ultrasonic whisper is going to get the job done.
At least with Alexa devices disabling alarm systems and unlocking doors requires a PIN, mostly to defend against the much more low tech attack of shouting through the victim's letter box.
For six months Siri interpreted my requests to call anyone as "Call Paul." It still might, but at that point I gave up and changed Paul's name in Contacts to stop accidentally bothering the poor guy.
Why would voice recognition software be interpreting ultrasonic (or near-ultrasonic) signals at all?
First, it doesn't make sense they'd be trained on them. So why would models be interpreting these as speech at all?
And second, it doesn't make sense they'd make it from the microphone to the recognition engine -- surely there's a low pass filter in there to remove all extraneous noise above the vocal range?
I don't get it.
(Edit: could it be some kind of downsampling aliasing artifact that is interpreted as normal vocal frequency, precisely because they skip a low pass filter that would prevent it?)
Why the heck wouldn't they have the wake-word listener confine itself to human-audible frequency ranges? Seems like that would be a really simple fix with zero loss of real-world functionality....
It might be that although the sound is inaudible to humans, it gets distorted around and inside the device, and that distorted sound is at audible frequencies that the alexa microphone picks up. Something like how these ultrasonic directional speakers work: https://www.holosonics.com/what-makes-a-sound-source-directi...
Yeah, it's weird. There was already a scare about ultrasonics being used for ad tracking in 2016 [1] so it's not like this is an unknown attack vector, and I thought the subsequent patch efforts had already added filters that stopped the phones listening on ultrasonics.
I also remember seeing a presentation on the first gen Echo which went into its noise cancelling tech, making sure that stuff coming out of the speaker wasn't received by the mic, so the success of the speaker-to-mic attack vector also seems totally bizarre.
That's on purpose. They created a specific inaudible frequency that the Alexa listens for which causes it to ignore the wake word. This is how they keep from annoying everyone and also blowing up their own servers if, for example, they want to run an Alexa commercial during the Superbowl.
> Reddit user aspyhackr may have figured out the trick Amazon uses here. Apparently, the Alexa commercials are intentionally muted in the 3,000Hz to 6,000Hz range of the audio spectrum, which apparently tips off the system that the “Alexa” phrase being spoken isn’t in fact a real command and should be ignored.
Seems to be the inverse - if the wake word _lacks_ these frequencies, then Echos ignore it.
Yes, I had misremembered it. That does leave a mystery of why they are listening for inaudible frequency. I wonder if it might have to do with their plans of having devices that can connect to other connected devices when they aren't connected to the user's wifi.
I also experienced Siri talking at what seems like 1% volume recently, out of no change that I can recall making. Just learned how to fix that through these comments, thanks.
This has been driving me mad. It seems certain actions have had all audible feedback disabled, but there's no real sense to which ones.
When I say across a room to set a reminder or add something to my shopping list my Homepods will just silently do so, with no indication but a flash of the screen. I have no idea if it's registered what I was saying or not.
When I ask to turn on the lights in a room, it'll do a bing-bong noise at me to indicate that it's registered despite the fact I can see the lights turning on. It's utter nonsense.
My guess is that the combination of a power button long-press continued by pressing the volume down button is something that might happen accidentally while you have the iPhone in your pocket.
Once Siri is active, using the hardware volume buttons control the feedback volume.
> It's also worth noting that the length of malicious commands must be below 77 milliseconds — that's the average reaction time for the four voice assistants across multiple devices.
I don't get it. Why? You can speak to smart assistants much longer than that isn't it?
This har me confused at first too. Basically you are playing audio on the device and that audio will send a command to the same device. If I’m watching a video on my phone and say hey siri, it will interrupt the audio playing from my phone as it listens to me. But it isn’t instantly interrupting the audio, it apparently takes about 77 milliseconds.
Near ultrasound audio doesn't need any special microphone. I just tested with my Samsung Galaxy S3 (i9300 version) and it has no trouble detecting 20kHz. I don't see why newer phones couldn't do the same.
I disabled it after a very potentially awkward event. I was talking to my wife while Siri on my Apple Watch misunderstood me, triggered itself and sent a text saying "I love you" to our previous maid (which we had fired because of reasons).
Years ago a classmate in college was showing Siri to one of our teachers. As a joke he said "Siri, I love you!" and Siri replied "I'm sorry, I can't find a location for <teacher's daughter>".
I know Siri is much too dumb to facilitate a money transfer…