Hacker News new | past | comments | ask | show | jobs | submit login

Hyper-Threading has been a source of security concerns for a decade now, and vulnerabilities in existing HT implementations have been trickling out over the last few years. Unlike Management Engine or TrustZone, at least we can disable Hyper-Threading (for a 30% performance hit).



Also, HT is not such a great performance win - on a few different 4-core/8-thread machines, I had access to, loading all 8 threads to "100% CPU" (whatever that means) usually only delivers 20-30% faster computation than with HT off (4-core/4-thread) - which is inline with your 30% number.

And that's an improvement - some 15 years ago, with similar computational loads, most of my tests ran 10-20% faster with the HT off (using 2 core / 2 threads) than with HT on (using 2 core / 4 threads) - there just wasn't enough cache to support those many threads.


A 20-30% increase is a BIG increase for a hardware feature, though. The cost of hyperthreading in transistors mostly amounts to the larger total register set. The whole point is the rest of the decode/dispatch/execute/retire pipeline is all shared.


How is 20%-30% not a great performance win? If I tell you today there's this One Simple Trick that you can do on your computer to instantly gain access to 20%-30% more performance, would you do it in a heartbeat?


What do you think is a good performance improvement then?


(and to the two other responses)

If your workload is already well parallelized, then, yes 20% is quite significant. However, working to parallelize properly over 8 rather than 4 has its own costs.

The thing that bothers me most is that 800% CPU and 500% CPU on this processor are roughly equivalent at 5x100%CPU, it makes everything very hard to reason about when planning capacity.


I think you’re misunderstanding what HT is. It’s not true parallelism, it’s just hiding latency by providing some extra superscalar parallelism. You can’t expect it to give you actual linear improvements in performance because it’s just an illusion.


I understand that very well. But non of the standard tools that manage CPU understand that, and most people don't either.

If I had a nickel for every time I had to explain why "You are at 50% CPU now, but you can't actually run twice as many processes on this machine and get the same runtime", I'd be able to buy a large frapuccino or two at starbucks.

Perhaps I'm uninformed though - is there a tool like htop, which would give me an idea of how close am I to maxing out a CPU?


No there isn’t. But if you understand it I don’t get why you think 20% isn’t a good performance boost, especially considering the rate of return for power and area in silicon.


Because many people believe it is a 100% improvement, plan/budget accordingly, and then look for help.

As far as silicon/power it is nice, but IIRC (I am not involved in purchasing anymore) it used to cost over 50% in USD for those 20% in performance when you non-HT parts were common.


What a strange way to measure the benefits of a performance optimization: "how people will perceive it and then ask me for help".


You ignored the price issue, which was measurable and real, but also:

It (used to be) my job. Does "because people fall for deceptive marketing, waste money, and then waste my time trying to salvage their reputation" sound better?


> loading all 8 threads to "100% CPU" (whatever that means)

What application?


Lots of numerical computations and simulations.


The security concern is remote code execution via JS, and sharing processor time with other people you don't trust, right?

It should be up to the VM-as-a-service and browser vendors to flush the cache properly.


No. The security concern is attackers reading data they shouldn’t. The article explains how.

“Microarchitectural Data Sampling (MDS) is a group of vulnerabilities that allow an attacker to potentially read sensitive data.”

That is way more serious than stealing cycles.


Ye, but I didn't understand how this was different than Spectre, except with different caches.

Still it's fine with no JS and no shared processor time, right?


Right. If you run no foreign code you are safe.


From a brief read, I think it reads in flight data not necessarily cached, so flushing cache won't help unfortunately.


One CPU per process makes a lot more sense, especially now that we have so many specialized CPUs in our machines anyway.


Ye I got a feeling that shared processor time with strangers is not viable without specialized hardware.


I think it isn't viable with non-deterministic (in time) hardware behavior. This means dedicated caches, or no caches at all. Dedicated guaranteed memory speeds and latencies. Dedicated processing units. The untrusted code cannot be affected by other code, otherwise the other code leaks its usage patterns across.


Decade and a half, even. If I remember right, the first CVE for an HT security flaw was summer 2005.


I announced it publicly 14 years ago yesterday.


This one? https://nvd.nist.gov/vuln/detail/CVE-2005-0109

Dang: "Hyper-Threading technology, as used in FreeBSD and other operating systems that are run on Intel Pentium and other processors, allows local users to use a malicious thread to create covert channels, monitor the execution of other threads, and obtain sensitive information such as cryptographic keys, via a timing attack on memory cache misses."

Also, found elsewhere:

"According to Linus Torvalds and others on linux-kernel this is a theoretical attack, paranoid people should disable hyper threading"


Yes. Intel dismissed it at the time, saying that "nobody would ever have untrusted code running on the same hardware on which cryptographic operations are performed".


30% performance hit? I'm sure that heavily depends on the workload... and I'm also sure you lose performance when HT is on, depending on the workload as well.


I've seen this claim made for routers and other low intensity low latency workloads.


That would make sense, my understanding is that with a 100% pegged CPU hyperthreading won't be super beneficial as they aren't real cores, just smarter scheduling. You can't really schedule 100% load better, however for applications that are latency specific it would make more sense, as you don't have the CPU pegged, you just want a faster response.


> You can't really schedule 100% load better

Sure you can. You can do math while another HT is waiting for memory. Sometimes you can even multiplex use of multiple ALUs or one HT can do integer and another can do floating point.

It's actually under high multithreaded load that HT shines, especially if that load is heterogenous or memory latency bound.


I too was once under the misapprehension that HT was "just smarter scheduling", until I took a university course in microarchitecture that explained how Simultaneous Multithreading actually works in terms of maximising utilisation of various types of execution units. I wonder why "smarter scheduling" became a common understanding.


Wouldn't hyperthreading also be more power-efficient compared to running a second core?


Disabling hyper-threading is highly unlikely to produce a 30% performance hit. Most highly optimized software disables or avoids hyper-threading because doing so increases performance.

Hyper-threading tends to benefit the performance of applications that have not been optimized, and therefore presumably are also not particularly performance sensitive in any case.


In highly-parallel workloads like rendering (ray tracing) where pipeline stalls due to loads happen quite regularly, it's fairly easy to get 20-35% speedups with HT.


In music production and C++ code compilation I get a pretty reliable +25% perf boost with HT on (this was not the case a few gens ago though).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: