Using ultrasound attack to disarm a smart-home system

addisonl · on April 5, 2023

> could be remotely hijacked, using carefully crafted near-ultrasonic sounds, and forced to make unwanted phone calls and money transfers, disable alarm systems, or unlock doors.

I know Siri is much too dumb to facilitate a money transfer…

Simon_O_Rourke · on April 5, 2023

Getting Siri to actually do what you want it by standing close by and shouting tends not to work - not sure how some ultrasonic whisper is going to get the job done.

jon-wood · on April 5, 2023

At least with Alexa devices disabling alarm systems and unlocking doors requires a PIN, mostly to defend against the much more low tech attack of shouting through the victim's letter box.

yelling_cat · on April 5, 2023

For six months Siri interpreted my requests to call anyone as "Call Paul." It still might, but at that point I gave up and changed Paul's name in Contacts to stop accidentally bothering the poor guy.

crazygringo · on April 5, 2023

This doesn't make any sense to me.

Why would voice recognition software be interpreting ultrasonic (or near-ultrasonic) signals at all?

First, it doesn't make sense they'd be trained on them. So why would models be interpreting these as speech at all?

And second, it doesn't make sense they'd make it from the microphone to the recognition engine -- surely there's a low pass filter in there to remove all extraneous noise above the vocal range?

I don't get it.

(Edit: could it be some kind of downsampling aliasing artifact that is interpreted as normal vocal frequency, precisely because they skip a low pass filter that would prevent it?)

knodi123 · on April 5, 2023

Why the heck wouldn't they have the wake-word listener confine itself to human-audible frequency ranges? Seems like that would be a really simple fix with zero loss of real-world functionality....

rjmunro · on April 5, 2023

It might be that although the sound is inaudible to humans, it gets distorted around and inside the device, and that distorted sound is at audible frequencies that the alexa microphone picks up. Something like how these ultrasonic directional speakers work: https://www.holosonics.com/what-makes-a-sound-source-directi...

camtarn · on April 5, 2023

Yeah, it's weird. There was already a scare about ultrasonics being used for ad tracking in 2016 [1] so it's not like this is an unknown attack vector, and I thought the subsequent patch efforts had already added filters that stopped the phones listening on ultrasonics.

I also remember seeing a presentation on the first gen Echo which went into its noise cancelling tech, making sure that stuff coming out of the speaker wasn't received by the mic, so the success of the speaker-to-mic attack vector also seems totally bizarre.

1: https://www.wired.com/2016/11/block-ultrasonic-signals-didnt...

jjeaff · on April 5, 2023

That's on purpose. They created a specific inaudible frequency that the Alexa listens for which causes it to ignore the wake word. This is how they keep from annoying everyone and also blowing up their own servers if, for example, they want to run an Alexa commercial during the Superbowl.

junon · on April 5, 2023

> Reddit user aspyhackr may have figured out the trick Amazon uses here. Apparently, the Alexa commercials are intentionally muted in the 3,000Hz to 6,000Hz range of the audio spectrum, which apparently tips off the system that the “Alexa” phrase being spoken isn’t in fact a real command and should be ignored.

Seems to be the inverse - if the wake word _lacks_ these frequencies, then Echos ignore it.

jjeaff · on April 12, 2023

Yes, I had misremembered it. That does leave a mystery of why they are listening for inaudible frequency. I wonder if it might have to do with their plans of having devices that can connect to other connected devices when they aren't connected to the user's wifi.

tjoff · on April 5, 2023

Listening on inaudible words surely are not on purpose?

Listening in on a frequency to disable word detection is a whole other thing...

nier · on April 5, 2023

The article taught me that I can control Siri’s feedback volume by for example saying “Hey Siri, speak 25 percent.”

jon-wood · on April 5, 2023

I had no idea about that. Quite tickled by it stopping and asking if I'm sure when I asked to speak at 100 percent.

karmaMeansCool · on April 5, 2023

I learned that too, but now I'm wondering why her voice suddenly went silent a few weeks ago...

bowmessage · on April 5, 2023

I also experienced Siri talking at what seems like 1% volume recently, out of no change that I can recall making. Just learned how to fix that through these comments, thanks.

jon-wood · on April 5, 2023

This has been driving me mad. It seems certain actions have had all audible feedback disabled, but there's no real sense to which ones.

When I say across a room to set a reminder or add something to my shopping list my Homepods will just silently do so, with no indication but a flash of the screen. I have no idea if it's registered what I was saying or not.

When I ask to turn on the lights in a room, it'll do a bing-bong noise at me to indicate that it's registered despite the fact I can see the lights turning on. It's utter nonsense.

nier · on April 5, 2023

My guess is that the combination of a power button long-press continued by pressing the volume down button is something that might happen accidentally while you have the iPhone in your pocket.

Once Siri is active, using the hardware volume buttons control the feedback volume.

BeefWellington · on April 5, 2023

Reminds me of this from a couple years ago where researchers figured out they could use lasers to do the same thing:

https://cse.engin.umich.edu/stories/researchers-take-control...

thrdbndndn · on April 5, 2023

> It's also worth noting that the length of malicious commands must be below 77 milliseconds — that's the average reaction time for the four voice assistants across multiple devices.

I don't get it. Why? You can speak to smart assistants much longer than that isn't it?

rblatz · on April 5, 2023

This har me confused at first too. Basically you are playing audio on the device and that audio will send a command to the same device. If I’m watching a video on my phone and say hey siri, it will interrupt the audio playing from my phone as it listens to me. But it isn’t instantly interrupting the audio, it apparently takes about 77 milliseconds.

camtarn · on April 5, 2023

Ah I see, thank you!

m_eiman · on April 5, 2023

What I'm wondering is how a useful command can be expressed in 77 ms?

stronglikedan · on April 5, 2023

"weather"

typedef_struct · on April 5, 2023

Phreaking is back?

doubled112 · on April 5, 2023

Will the right frequencies in the right order get me free Amazon delivery?

amelius · on April 5, 2023

They tried this before on a smart-hotel in Havana.

can16358p · on April 5, 2023

This makes me wonder if the research was inspired by the clever "prompt injection" attacks in the ChatGPT/LLM world (or vice versa).

xnickb · on April 5, 2023

Of course. And even the attack itself was proposed by chatgpt.

_twor · on April 5, 2023

Quite sure the microphones on any smart speaker or phone do not have the build quality to 'hear' near ultrasonic audio.

mrob · on April 5, 2023

Near ultrasound audio doesn't need any special microphone. I just tested with my Samsung Galaxy S3 (i9300 version) and it has no trouble detecting 20kHz. I don't see why newer phones couldn't do the same.

_twor · on April 8, 2023

20kHz is not ultrasonic.

Jnr · on April 5, 2023

I simply do not expose critical switches to Siri from Home Assistant.

midasuni · on April 5, 2023

I only have Siri kick in when I press the button.

pcardoso · on April 5, 2023

I disabled it after a very potentially awkward event. I was talking to my wife while Siri on my Apple Watch misunderstood me, triggered itself and sent a text saying "I love you" to our previous maid (which we had fired because of reasons).

thedanbob · on April 5, 2023

Years ago a classmate in college was showing Siri to one of our teachers. As a joke he said "Siri, I love you!" and Siri replied "I'm sorry, I can't find a location for <teacher's daughter>".