I think the paper's contributions really don't have anything to do with ML; it's about the new side channel with interrupts, which is a cool find. ML just gets more people to read it, which I guess is ok. I mean, you could just use "statistics" here in much the same way.
I remember an advisor once telling me: once you figure out what a paper is really about, rewrite it, and remove the stuff you used to think it was about. The title of this paper should be about the new side channel, not about the ML story, imho.
Thanks for reading! The two stories are of course deeply intertwined: we wouldn’t have found the new side channel without the cautionary tale about machine learning.
But the finding about ML misinterpretation is particularly notable because it calls a lot of existing computer architecture research into question. In the past, attacks like this were very difficult to pull off without an in-depth understanding of the side channel being exploited. But ML models (in this case, an LSTM) generally go a bit beyond “statistics” because they unlock much greater accuracy, making it much easier to develop powerful attacks that exploit side channels that aren’t really understood. And there are a lot of ML-assisted attacks created in this fashion today: the Shusterman et al. paper alone has almost 200 citations, a huge amount for a computer architecture paper.
The point of publishing this kind of research is to better understand our systems so we can build stronger defenses — the cost of getting this wrong and misleading the community is pretty high. And this would technically still be true even if we ultimately found that the cache was responsible for the prior attack. But of course, it helps that we discovered a new side channel along the way — this really drove our point home. I probably could have emphasized this more in my blogpost.
yes - I also feel this does not have strong new findings about ML except some common sense that all ML practitioners should have: that is, do not interpret ML results as cause-and-effect explanations when the data you have captured and modelled does not warrant it.
Maybe in the real world, this common sense gets lost in the deluge of correlations when people are immersed in a sea of data -- but good experiment design and peer review should ideally sift out any unsound conclusions and interpretations -- which, to be fair, this replication study does an excellent job of!
Almost didn't read it because of the length and how it starts. Generally I just want the meat, not the backstory. But your comment convinced me to read it, and it is indeed great!
> Next year, I will start my six-year PhD in computer science back at MIT, and I could not be more thrilled!
Incredible... and all started because the author had a "lucky idea" to try something random like using a counter instead of the much more advanced cache-evicting attack of the original side channel attack... which only worked because of concepts they had no idea about at the time :D
I am one of the probably thousands of others who were not so lucky and quickly abandoned the idea of staying in academia and went to work in the industry for a mediocre career.
I started an Honours Degree (kind of Masters in Australia) in Computer Science where I wanted to write a Thesis on Artificial Intelligence (this was much earlier than the current AI hype, circa 2010) based on AI applications I had studied in the regular AI course (how AI was being used by wineries to improve their wine quality and production - I wanted to try and apply their techniques on more "general" applications) but the supervisor I got had zero interest in helping, and I had zero support from anyone else, so it was impossible to continue, specially when I had a full time job offer for quite a good salary, and if I had done so I would probably never get anywhere... as the author mentions, it was thanks to their supervisor and to others who helped him along the way that everything just happened for him... alone, you must be extremely driven and talented to get anywhere, which I think I wasn't either.
Being at the good place with the good people is indeed a very important factor for succeeding or not. In my first PhD experience in Japan my professor and the others just kept criticizing whatever I proposed for 3 years without giving me actionable ideas. The prof in the lab next door loved my research, sadly I found him too late to switch lab. Now I'm at a place with half of the people in the country that can understand fully another project of mine and give a shit (that's a grand number of 2 people), and my project already benefited from some of their data. Plus the director likes me too and includes my in the lab activity even if I'm not officially affiliated to the lab. Now, that's the environment I can succeed in. My take way is finding the right environment and people may be difficult, but it's crucial other even very good work is done for nothing.
Tangent and the smallest of nitpicks about the page: the <HR> element's styling with a line of big dots confused me into thinking it was the position indicator of an image carousel!
This article is awesome, your writing is super approachable and the interactive demos are really cool. I also appreciate the background on how you got into doing this sort of thing.
Very interesting and well-explained. Given that the research has been out for two years, any interested data collectors have considered this! Forget hackers, this an exploit for enterprises and governments!
Could websites concerned with privacy deploy a package that triggers interrupts randomly? Could a browser extension do it for every site?
Websites doing this would have to be careful about it: they might become the only website triggering a lot of interrupts randomly, which then makes them easy to identify.
Our countermeasure which triggers interrupts randomly is implemented as a browser extension, the source code for which is available here: https://github.com/jackcook/bigger-fish
I'm not sure I would recommend it for daily use though, I think our tests showed it slowed page load times down by about 10%.
I'm on safari/macOS, and many of the counting related demonstrations did not vary as much as claimed -- some did, with significant computer use, but I'd bet some mitigations have been implemented already in Safari.
This is awesome, interesting and well explained. Thank you! I would love to hear more about it. I wonder which concrete real life uses this might have (had).
I wonder if adopting io_uring on Linux might allow a browser to preserve the privacy a little, in this specific case. (Though it is very hard to get right, unfortunately.)
My suspicion is that io_uring itself mitigates syscall overhead but doesn't do anything to change interrupts.
You could probably do things at the OS level to change interrupt behavior in a way that would mitigate this attack significantly, I'll need to read the paper to see if they discuss this.
Correct. Probably the only way to mitigate interrupt stuff today is what they mentioned - you inject noise into the system intentionally with their example being to make network requests to local addresses. Fundamentally though the challenge is that if you start doing that, you probably start degrading performance fairly quickly for your neighbors. It’s really hard to balance mitigations that retain good performance. A more comprehensive solution probably involves a redesign of how we build CPUs and operating systems rather than trying to keep fighting this in software.
Interrupt noise can be eliminated by eliminating the interrupts themselves using user-space drivers like SPDK and DPDK for storage and networking, but (a) that would require a massive change in application architecture, and (b) it wouldn't help non-movable interrupts like softirqs or IPIs for rescheduling and TLB shootdown.
Softirqs aren't really interrupts, and they're totally under kernel control, so it might be possible to spread them out across cores or otherwise reduce their signal.
Eliminating noise from IPIs for rescheduling and TLB shootdown might require crazy architectural changes to the CPU - for instance an architecturally isolated fast timer which is basically a separate CPU, polls a queue of TLB shootdown requests and a wakeup request flag, and can exit without waking the CPU from a halt.
Fuzzing the timer seems like a hack - it doesn't eliminate the information leakage, but just makes it harder to measure. You can eliminate the signal by only reporting the amount of time that passes in user mode, but that results in a clock that can be arbitrarily slower than wall clock time. I suppose you could add a correction factor that's heavily filtered, so the final timer is never off by more than a constant amount, but this would have to be implemented as a new OS timer type with instrumentation in every interrupt handler, and then Javascript would have to be updated to use that new timer.
I come from embedded audio programming, where e.g. the variable loads of UI code can be problematic (=audible) for audio quality if you don't do things right.
Maybe we need to do things the other way around? So instead of trying to mask everything we are doing, we run browsers/tabs in a processing environment where the noise can't be measured because it does not occur during the same time window. In audio that is done by using a high priority fixed timer that interrupts the rest of the processing.
My OS knowledge is too marginal to know whether that would be truly feasible, but I can't help to think: yeah it is possible to fix that on a more fundamental level.
As far as browsers are concerned the actual solution is banning Javascript from regular Web. JS is basically remote code execution (even more so since JIT became the norm); it is a terrible idea that will continue to create all sorts of problems.
I think the paper's contributions really don't have anything to do with ML; it's about the new side channel with interrupts, which is a cool find. ML just gets more people to read it, which I guess is ok. I mean, you could just use "statistics" here in much the same way.
I remember an advisor once telling me: once you figure out what a paper is really about, rewrite it, and remove the stuff you used to think it was about. The title of this paper should be about the new side channel, not about the ML story, imho.
But this is just a nitpick. Great work!
reply