They advise only to use lfence, similar to compiler vendors. I advise to use a full mfence instead when clearing secrets. Load/store ordering is violated in caches. And cleaning secrets is done not so often, it needs to be reliable. MDS is thanksfully only for small data, and modern keys are much larger. But adding a simple verw for the tiny non-cache buffers does not hurt either.
> Linux users should apply kernel and CPU microcode updates as soon as they are available from their distribution vendor, and follow any guidance to adjust system settings.
Canonical says that they have those for 14/16/18.04 [1]. But possibly more interesting is the fact that this disclosure has been so well synchronized. How do the relevant players decide what the threshold is for informing other tech companies? How does everyone know what policies that the constituent companies use to prevent early disclosure or unintended disclosure to 'somewhat-less-trusted-employees'? Is this all coordinated by US CERT?
As with Spectre/Meltdown, L1TF et al, Intel chooses who to loop in to their disclosure.
All of it is tightly controlled under an embargo. Who they choose to involve is entirely their decision, and is likely based on previous experience with those parties and their likelihood of leaking. Intel doesn't want these kinds of things to leak before official communication is done, or it's pretty much guaranteed to impact their stock price.
This time around has gone much smoother than the previous ones, though L1TF was pretty good too. L1TF was a little rough with the patching side of things because the patches were finalised a little late.
The various distributions and companies knew that the embargo was due to end at 10am pacific, and were probably (like us) refreshing the security advisories page on Intel's site waiting to pull the trigger on all the relevant processes, like publishing blog pages etc.
Wow, ChromeOS decided to disable hyperthreading entirely? That seems like a pretty drastic mitigation. I wonder if that's just a short term solution or if they're planning to leave it that way indefinitely.
Hyper-Threading has been a source of security concerns for a decade now, and vulnerabilities in existing HT implementations have been trickling out over the last few years. Unlike Management Engine or TrustZone, at least we can disable Hyper-Threading (for a 30% performance hit).
Also, HT is not such a great performance win - on a few different 4-core/8-thread machines, I had access to, loading all 8 threads to "100% CPU" (whatever that means) usually only delivers 20-30% faster computation than with HT off (4-core/4-thread) - which is inline with your 30% number.
And that's an improvement - some 15 years ago, with similar computational loads, most of my tests ran 10-20% faster with the HT off (using 2 core / 2 threads) than with HT on (using 2 core / 4 threads) - there just wasn't enough cache to support those many threads.
A 20-30% increase is a BIG increase for a hardware feature, though. The cost of hyperthreading in transistors mostly amounts to the larger total register set. The whole point is the rest of the decode/dispatch/execute/retire pipeline is all shared.
How is 20%-30% not a great performance win? If I tell you today there's this One Simple Trick that you can do on your computer to instantly gain access to 20%-30% more performance, would you do it in a heartbeat?
If your workload is already well parallelized, then, yes 20% is quite significant. However, working to parallelize properly over 8 rather than 4 has its own costs.
The thing that bothers me most is that 800% CPU and 500% CPU on this processor are roughly equivalent at 5x100%CPU, it makes everything very hard to reason about when planning capacity.
I think you’re misunderstanding what HT is. It’s not true parallelism, it’s just hiding latency by providing some extra superscalar parallelism. You can’t expect it to give you actual linear improvements in performance because it’s just an illusion.
I understand that very well. But non of the standard tools that manage CPU understand that, and most people don't either.
If I had a nickel for every time I had to explain why "You are at 50% CPU now, but you can't actually run twice as many processes on this machine and get the same runtime", I'd be able to buy a large frapuccino or two at starbucks.
Perhaps I'm uninformed though - is there a tool like htop, which would give me an idea of how close am I to maxing out a CPU?
No there isn’t. But if you understand it I don’t get why you think 20% isn’t a good performance boost, especially considering the rate of return for power and area in silicon.
Because many people believe it is a 100% improvement, plan/budget accordingly, and then look for help.
As far as silicon/power it is nice, but IIRC (I am not involved in purchasing anymore) it used to cost over 50% in USD for those 20% in performance when you non-HT parts were common.
You ignored the price issue, which was measurable and real, but also:
It (used to be) my job. Does "because people fall for deceptive marketing, waste money, and then waste my time trying to salvage their reputation" sound better?
I think it isn't viable with non-deterministic (in time) hardware behavior. This means dedicated caches, or no caches at all. Dedicated guaranteed memory speeds and latencies. Dedicated processing units. The untrusted code cannot be affected by other code, otherwise the other code leaks its usage patterns across.
Dang: "Hyper-Threading technology, as used in FreeBSD and other operating systems that are run on Intel Pentium and other processors, allows local users to use a malicious thread to create covert channels, monitor the execution of other threads, and obtain sensitive information such as cryptographic keys, via a timing attack on memory cache misses."
Also, found elsewhere:
"According to Linus Torvalds and others on linux-kernel this is a theoretical attack, paranoid people should disable hyper threading"
Yes. Intel dismissed it at the time, saying that "nobody would ever have untrusted code running on the same hardware on which cryptographic operations are performed".
30% performance hit? I'm sure that heavily depends on the workload... and I'm also sure you lose performance when HT is on, depending on the workload as well.
That would make sense, my understanding is that with a 100% pegged CPU hyperthreading won't be super beneficial as they aren't real cores, just smarter scheduling. You can't really schedule 100% load better, however for applications that are latency specific it would make more sense, as you don't have the CPU pegged, you just want a faster response.
Sure you can. You can do math while another HT is waiting for memory. Sometimes you can even multiplex use of multiple ALUs or one HT can do integer and another can do floating point.
It's actually under high multithreaded load that HT shines, especially if that load is heterogenous or memory latency bound.
I too was once under the misapprehension that HT was "just smarter scheduling", until I took a university course in microarchitecture that explained how Simultaneous Multithreading actually works in terms of maximising utilisation of various types of execution units. I wonder why "smarter scheduling" became a common understanding.
Disabling hyper-threading is highly unlikely to produce a 30% performance hit. Most highly optimized software disables or avoids hyper-threading because doing so increases performance.
Hyper-threading tends to benefit the performance of applications that have not been optimized, and therefore presumably are also not particularly performance sensitive in any case.
In highly-parallel workloads like rendering (ray tracing) where pipeline stalls due to loads happen quite regularly, it's fairly easy to get 20-35% speedups with HT.
Maybe we should start to seriously question the value of so long embargos. This is coordinated disclosure; if the vendor refuses reasonable coordination (and it seems Intel does, with such delays, and also because it stills silos the security researchers way too much), then fuck them and publish (probably not immediately but certainly not after a year...)
It seems that broadly the same principles have been found independently by tons of teams. Expecting that well-financed actors have not explored that field and/or not yield any similar result at this point is completely insane.
Meaning, given the high level of technicality required, it's even doubtful that the embargo protected anybody; it might be that no attacker exist (and I postulate will ever exist) that will be simply waiting for 3rd party disclosure before writing its own exploits in that class. On the other hand, typical security providers monitoring threats in the field might not be aware for a long time of the existence of such vulnerabilities.
Now here arguably the first counter measures are similar to those for L1TF, so hopefully sensitive operators would already have disabled HT. However, it is not very cool to not make them aware of this additional (and slightly different) risk during such a ridiculously long period.
Also: does Intel has competent people working on their shit anymore??? They know the fundamental principles; which is speculative execution on architecturally out-of-reach data, followed by a fault and a subsequent extraction via covert channels of un-rolled-back modified micro-architectural state. The broad micro arch is widely known, so do they really expect that 3rd party security researchers won't found all the places where they were sloppy enough to speculatively execute some code on completely "garbage" data? Or were they themselves unable to do a proper comprehensive review, despite having access to the full detailed design (and despite a dedicated team having been created for that)? In either case, this is not reassuring.
I'm not really sure what the question is supposed to be. You could discover an Intel vulnerability and give them a 90 day timeline, or, for that matter, do what the Metasploit founders would have done and just immediately code up an exploit and publish it with no warning. All of these are viable options and all have precedent; it's up to the people discovering the flaws to make their own decisions.
It's particularly weird in this case to suggest that the embargo didn't help anyone, since (1) nobody appears to have leaked these flaws and (2) the cloud providers all seem to have fixes queued up.
Intel claims to have discovered some of these flaws internally, and this is a bug class we've known about (for realsies) for a little bit over a year now, in a class of products for which development cycles are themselves denominated in multiple years, so I'd cut them a bit of slack.
It really depends. Think about it this way, would you rather have an undisclosed vulnerability go untreated and undetected (as far as everyone knows) for an extra year, or suddenly disclose it to the rest of the world before all major interested parties (big companies, chip vendors, etc) have workarounds and mitigations techniques, so actual malicious attackers can exploit it before the countermeasures are ready?
In an ideal world, you should disclose everything and let everyone know so they can take measures against it, but in reality there might be less damage to let the vulnerability continue stay undetected for a few more months while everyone else plans to patch it and release such fixes as it gets disclosed.
I do agree that almost a whole year is, however, a very long time though.
considering the june/july initial reporting, the stacking of evidence to related exploits and the release in may next year it look more like 9 months plus some slacking due to multiple being reported. Does not sound like a "they kept waiting indefinitely" but more like proper due diligence.
Interesting point: "MDS is addressed in hardware starting with select 8th and 9th Generation Intel® Core™ processors, as well as the 2nd Generation Intel® Xeon® Scalable processor family." Looks like my 8700K isn't on the list though.
According to the researchers in the paper[0] this is not true.
>We have verified that we can leak information across arbitrary address spaces and privilege boundaries, even on recent Intel systems with the latest microcode updates and latest Linux kernel with all the Spectre, Meltdown, L1TF default mitigations up (KPTI, PTE inversion, etc.). In particular, the exploits we discuss below exemplify leaks in all the relevant cases of interest: process-to-process, kernel-to-userspace, guest-to-guest, and SGX-enclave-touserspace leaks. Not to mention that such attacks can be built even from a sandboxed environment such as JavaScript in the browser, where the attacker has limited capabilities compared to a native environment.
I searched the the paper and it doesn't seem to falsify what I linked to, but I'll have to dig deeper into the research. "Recent Intel systems" isn't specific enough.
In a Dutch article (https://nos.nl/artikel/2284630-nederlanders-vinden-beveiligi...), one of the researchers says "het aantal mensen bij bedrijven als Intel die zich op dit niveau met beveiliging bezighoudt, is echt op de vingers van twee handen te tellen." = There are 10 or fewer people working on security at this level at companies like Intel.
This sounds very hard to believe to me. With the previous attacks there surely are bigger teams working on this kind of stuff?
There's other people working on it outside of Intel, too. https://mdsattacks.com/ if you look at the list of people you'll see there's dozens of folk that independently found and reported the same vulnerabilities.
People don't necessarily need to be in one big team. Lots of things that are important can be worked on by more than 10 people.
(Surely Google and Facebook each have more than 10 people working on security).
Is there evidence that so few people are working on security at Intel?
I like how Intel prominently thanks their own employees for finding the bugs and later simply acknowledges the existence of any anyone independent reporters with zero thanks.
funny that you don't see their email address for contact if you don't have javascript running, which is exactly one of the doors for this kind of vulnerability as they mention on their site
End-user security, in web browser context: do I understand it correctly that if my browser was to only ever execute JavaScipt in bytecode format (without compilation to native code) it would be safe from those kinds of exploits?
Presuming the bytecode interpreter would be "slow enough" and "jittery enough" and "indirect enough" to hamper any attempts at exploiting subtle timing+memory layout bugs like that?
IIRC, Konqueror (of KDE) had reasonably fast bytecode JS engine. I wish the browser was still undergoing fast development, used to be my daily driver for many years.
AFAIK, there are techniques to detect and denoisify minuscule timing differences over millions of samples, and the fundamentals of most techniques apply to interpreters as well, so it is not a solid protection.
That said, it would make things harder in practice since you’re introducing an extra indirection level and just making everything slower.
As for interpreters in modern browsers, I’d be surprised if there’s no way to entirely disable the JIT somehow... since most JIT implementations I have seen have an interpreter fallback for debugging and easier portability to new CPU architectures.
[edit: I finally finished reading everything. It seems like these new leaks can be triggered from JS as they still fundamentally reduce to "read time for memory access"]
For spectre simply having attacker directed control flow was sufficient - so logically almost any scripting language could be exploited.
Same goes for most of the TLB attacks.
Others required native code because they needed to use specific instructions (that aren’t going to be emitted intentionally by any compiler - jit or otherwise).
It seems that we need to move away from clever, complicated low-level micro-optimizations that rely on mangling instructions and just use more cores. That should allow for simpler security model.
There are plenty of scenarios where synchronization overheads between cores dwarf the performance gain, but OoO execution can help.
But maybe instead of having more cores, we should expose the different execution units within a CPU core to the architectural level? That however brings back memories of Itanium, and the general fact that compilers just can't do static scheduling well enough.
I still don't think so. Exposing these microarchitectural concerns to the architectural level limits flexibility. In order for compilers to efficiently schedule multiple execution units, the compiler needs to know the exact latency of all instructions. That may be doable for arithmetic, but varies greatly from one generation of processor to the next. And compilers definitely cannot know the latency of a load: from a few cycles in L1 cache, to a few thousand cycles in DRAM, to millions of cycles if there's a page fault. And these things vary a lot, not just between processor generations but within the same processor generation.
Can I ask for money back? Intel should return 30% cost of all vulnerable CPUs then... because disabling HT is effectively reducing the claimed performance specs.
Foreshadow/L1TF is the only prior problem of this nature that is unique to Intel. Meltdown bugs were also found in ARM and IBM POWER and mainframe designs, and Specture hits all of those and AMD as well.
I don't think that anybody can know whether this is true, since exploitation leaves little evidence. Even before this is witnessed in the wild for the first time, you can't really know which secrets of yours have already been exfiltrated.
Everything that can't be fixed with a ten minute phone call to my bank is already public knowledge thanks to Experian, so I really don't have anything left to fear.
You have no conversations that'd you prefer not be sold on the darknet? With friends, family, therapists, doctors, lawyers, consultants?
No pictures of your kids that they might not want spilled into a searchable database and used for machine learning to sell them things later in life?
No private or symmetric keys which might be used to impersonate you or eavesdrop on you later?
No in-progress documents which you aren't ready to publish?
No conversations with political allies that you might not want the state to peruse?
No intimate conversations with sexual partners?
If that's true, then I think you have a very different attack surface than most people. I think most people are willing to take a small performance hit not to open up access to much of the data that goes across their CPU, which is not an exaggeration for the combination of attacks which have been published against Intel CPUs over the past 3 years.
If someone wants to leverage speculative-execution vulnerabilities to get that sort of information off of my PC, it's not a problem that can be solved by yet another security patch. Don't reduce my PC's performance for the sake of somebody else's security concern.
At the end of the day the only secure computer is one that's turned off and locked up in a supply closet.
Not on any x86 device, no. Not that I'd be a particularly easy target since I use NoScript with a whitelist and keep my router's firewall very strict. I suppose someone could come at me with a malicious Steam game.
"Arguing that you don't care about the right to privacy because you have nothing to hide is no different than saying you don't care about free speech because you have nothing to say." -E. Snowden
I'd really like to be given a choice, at least. My gaming PC is used exclusively for gaming, so it needs to be performant, but does not need to be secure.
'nopti' only disables the Meltdown mitigation. To disable all the mitigations (including Spectre, Meltdown, L1TF, and now MDS) you can use mitigations=off on newer kernels.
It certainly doesn't feel good. But if the home market remained unpatched with a public POC, they would be attacked. The most likely avenue is by writing malicious web pages to steal bitcoin wallets, etc.
Didn't they already have a proof of concept Spectre or Meltdown exploit via a web page? Malicious ads seem like the best way to spread ransomware, etc.
It is funny how ChromeOS is the most ridiculously secure of the commonly available operating systems. It is not as if you can do much other than surf the internet with it.
It makes me chuckle to think that my not-so-computer-literate friend whom I gave a Chromebook to is protected from anyone snooping in on Youtube, Hotmail and Youtube running on this toy machine (designed for 9 year olds). There really is nothing to hide there. Meanwhile, people doing important work on proper computers are properly vulnerable to this new Hyperthreading theoretical attack.
I will be interested to find out if there is a drop-off in performance on ChromeOS, e.g. Youtube stuttering whilst the whatsapp web tab updates itself with a new message. If there is nobody complaining then why did we need Hyper-Threading in the first place?
You can run Android and Linux apps on ChromeOS. And with PWA‘s and WebAssembly maturing the difference between native app and web app is getting smaller and smaller. Many developers use it for work. A lot of dev work in the enterprise isn‘t done locally anyway.
Hyper threading was an intel stop-gap reaction to the athalon64 x2, which was a REAL dual core, to buy them time while the pentium D was created and later laughed off the market. We finally got an "OK" dual core from intel when they decided to hack pentium 3 cores together and call it the Core duo, and with the core 2 duo they finally caught back up to AMD (by hacking amd64 instructions onto the P3 cores) and were able to start taking market share back. Nothing interesting happens between then and threadripper, but now we would be back to eating popcorn and watching the rest of the fight..... but the fight is over and everyone is over in the other arena watching arm and webkit winner-take-all style demolishing the incumbent platforms.
Calling Core duo a Pentium III core, esp. when talking about microarchs, is a slight misrepresentation. Of course it was way closer and more of a derivative of the PPro descendants. But P6 did not vary a lot between PPro and P III, while before reaching Core Duo it went through Pentium M, and then enhanced. So yeah, it looks like more a Pentium III than a Pentium 4, but it was certainly not just a "hack [gluing] pentium 3 cores together"
Also Netburst was not that bad. It was a dead-end, yes, but on some markets it could compete with what AMD had.
Plus implementing SMT is not necessarily extremely easy compared to SMP, especially when you evolve designs.
And anyway, Intel shipped HT way before AMD shipped the Athlon 64 x2...
Commentary by RedHat: https://www.redhat.com/en/blog/understanding-mds-vulnerabili... (https://news.ycombinator.com/item?id=19912108)
Commentary by Ubuntu: https://blog.ubuntu.com/2019/05/14/ubuntu-updates-to-mitigat...