Fun tidbit: Three letter intelligence agencies have known about this attack vector since before this paper was published, and are worried it could be exploited in the wild.
[British government officials] expressed fears that foreign governments, in particular Russia or China, could hack into the Guardian's IT network. But the Guardian explained the security surrounding the documents, which were held in isolation and not stored on any Guardian system.
However, in a subsequent meeting, an intelligence agency expert argued that the material was still vulnerable. He said by way of example that if there was a plastic cup in the room where the work was being carried out foreign agents could train a laser on it to pick up the vibrations of what was being said. Vibrations on windows could similarly be monitored remotely by laser.
I've actually done this in a lab with a laser - it's a fun trick. What the original article talks about though is passively measuring sound from objects using a camera - no laser. Although they're measuring the same thing (physical vibrations of objects) the way they're doing it is actually quite different.
A friend of mine used that technique while working in an entomology lab studying the mating habits of cockroaches. They were trying to figure out what kinds of vocalizations the cockroaches were making while mating, and bouncing a laser off their shell was by far the easiest way to record those very quiet sounds.
I wouldn't be surprised if the non-laser techniques eventually also show up as off-the-shelf products too, simply to reduce costs - a simple camera could be cheaper than a laser, and might have advantages in being able to record sounds off multiple points at once.
To make a camera-based solution practical, it seems like one would neeed a specialized camera, for modest resolution, but very high frame rate. Without enough frame-rate, you can't capture high frequencies. Even if speech could be discerned from a spectrum that rolls off at 300hz, well under the peak energy of a typical human's speech, you would still need many hundreds of frames per second.
If you watch the video at the end, you'll see that they actually exploit the fact that CMOS pixels are read out sequentially and use the data from each row to extract surprisingly decent high frequency data from even 60 Hz video!
They article says they were able to use the rolling shutter on standard cameras to extract data at a much higher frequency than the nominal framerate would allow.
The laser thing has been around for a while and is quite well known. The innovation here is that they found a way to do it without any lasers and possibly even with consumer cameras.
Bill Freeman, the PhD supervisor on this project, is a really creative computer vision researcher. Read through his CV, he has a great ability to pick interesting problems slightly outside the mainstream. I wish for that ability.
(Note that real guitar strings don't vibrate in this way; they vibrate across their entire length, with limited higher order harmonics; the visible waveforms here are an artifact of the CCD capture process).
This will help to solve debates like the one about the moon landing.
Use this video (https://www.nasa.gov/multimedia/hd/apollo11_hdpage.html - the highest HD I could find) of the flag moving and extract sound.
Then extract the sound (if there is any)
http://people.csail.mit.edu/mrub/vidmag/ is the site that he mentions at the end or the ted talk given that goes into more about the workflow.
Not a programmer so not sure how to use the code to implement it.
Any takers?
As the original footage was done on film, the time difference between the top and bottom of a single frame is zero, unlike a "rolling shutter" progressively scanned sensor. The framerate is also much lower than the 60fps phone camera used in the last example from the MIT video, which was only barely enough to get meaningful data out.
So unfortunately, we probably won't be using this to get much out of old traditionally archived footage - not because of resolution of the frame, but time resolution.
Laser vibrometers are neat. They can also be used to sniff keystrokes remotely by pointing them at a laptop lid. Research also released before this paper.[1]
I'm assuming it would need to be extremely high frame rate video to extreact decipherable voice data. Because even at 60 frames per second, which I understand is at the high end for normal video, that gives you a maximum frequency of only 30Hz, which is nearing the low end of human perception.
So it's interesting and a definite risk if an attacker is supplying their own camera, but definitely does NOT mean you can pull voice data from the vibrating tablecloth in a YouTube video, right?
Not from a YouTube video, due to the video compression obliterating the small details needed to get the frequency. But the article notes that a rolling shutter allows even a 60fps camera to capture higher frequencies. Each scan line of the captured video is at a sub-frame time slice. If the object occupies a hundred scan lines, then you have 100 slightly shifted 60 hz samples that you can combine to reconstruct higher frequencies.
They show a technique which exploits the rolling shutter effect to recover frequencies at up to five times the frame rate. On the other hand, they also mention that artifacts in highly compressed video stop their techniques from working, so any secrets in your YouTube vids are probably safe.
Depends what's on camera and how smart your software is. A full wavelength of audible sound is between a few centimetres and a few tens of metres. If you have a few objects in scene at such known distances, you can calculate how far out of phase they are, interpolate them and piece together a much higher effective sampling rate.
https://www.theguardian.com/world/2013/aug/20/nsa-snowden-fi...
[British government officials] expressed fears that foreign governments, in particular Russia or China, could hack into the Guardian's IT network. But the Guardian explained the security surrounding the documents, which were held in isolation and not stored on any Guardian system.
However, in a subsequent meeting, an intelligence agency expert argued that the material was still vulnerable. He said by way of example that if there was a plastic cup in the room where the work was being carried out foreign agents could train a laser on it to pick up the vibrations of what was being said. Vibrations on windows could similarly be monitored remotely by laser.