Hyper-Threading has been a source of security concerns for a decade now, and vul...

beagle3 · on May 14, 2019

Also, HT is not such a great performance win - on a few different 4-core/8-thread machines, I had access to, loading all 8 threads to "100% CPU" (whatever that means) usually only delivers 20-30% faster computation than with HT off (4-core/4-thread) - which is inline with your 30% number.

And that's an improvement - some 15 years ago, with similar computational loads, most of my tests ran 10-20% faster with the HT off (using 2 core / 2 threads) than with HT on (using 2 core / 4 threads) - there just wasn't enough cache to support those many threads.

ajross · on May 14, 2019

A 20-30% increase is a BIG increase for a hardware feature, though. The cost of hyperthreading in transistors mostly amounts to the larger total register set. The whole point is the rest of the decode/dispatch/execute/retire pipeline is all shared.

kccqzy · on May 14, 2019

How is 20%-30% not a great performance win? If I tell you today there's this One Simple Trick that you can do on your computer to instantly gain access to 20%-30% more performance, would you do it in a heartbeat?

pertymcpert · on May 14, 2019

What do you think is a good performance improvement then?

beagle3 · on May 14, 2019

(and to the two other responses)

If your workload is already well parallelized, then, yes 20% is quite significant. However, working to parallelize properly over 8 rather than 4 has its own costs.

The thing that bothers me most is that 800% CPU and 500% CPU on this processor are roughly equivalent at 5x100%CPU, it makes everything very hard to reason about when planning capacity.

pertymcpert · on May 15, 2019

I think you’re misunderstanding what HT is. It’s not true parallelism, it’s just hiding latency by providing some extra superscalar parallelism. You can’t expect it to give you actual linear improvements in performance because it’s just an illusion.

beagle3 · on May 15, 2019

I understand that very well. But non of the standard tools that manage CPU understand that, and most people don't either.

If I had a nickel for every time I had to explain why "You are at 50% CPU now, but you can't actually run twice as many processes on this machine and get the same runtime", I'd be able to buy a large frapuccino or two at starbucks.

Perhaps I'm uninformed though - is there a tool like htop, which would give me an idea of how close am I to maxing out a CPU?

pertymcpert · on May 17, 2019

No there isn’t. But if you understand it I don’t get why you think 20% isn’t a good performance boost, especially considering the rate of return for power and area in silicon.

beagle3 · on May 17, 2019

Because many people believe it is a 100% improvement, plan/budget accordingly, and then look for help.

As far as silicon/power it is nice, but IIRC (I am not involved in purchasing anymore) it used to cost over 50% in USD for those 20% in performance when you non-HT parts were common.

pertymcpert · on May 17, 2019

What a strange way to measure the benefits of a performance optimization: "how people will perceive it and then ask me for help".

beagle3 · on May 18, 2019

You ignored the price issue, which was measurable and real, but also:

It (used to be) my job. Does "because people fall for deceptive marketing, waste money, and then waste my time trying to salvage their reputation" sound better?

coherentpony · on May 15, 2019

> loading all 8 threads to "100% CPU" (whatever that means)

What application?

beagle3 · on May 15, 2019

Lots of numerical computations and simulations.

rightbyte · on May 14, 2019

The security concern is remote code execution via JS, and sharing processor time with other people you don't trust, right?

It should be up to the VM-as-a-service and browser vendors to flush the cache properly.

compiler-guy · on May 14, 2019

No. The security concern is attackers reading data they shouldn’t. The article explains how.

“Microarchitectural Data Sampling (MDS) is a group of vulnerabilities that allow an attacker to potentially read sensitive data.”

That is way more serious than stealing cycles.

rightbyte · on May 14, 2019

Ye, but I didn't understand how this was different than Spectre, except with different caches.

Still it's fine with no JS and no shared processor time, right?

rurban · on May 14, 2019

Right. If you run no foreign code you are safe.

bloomer · on May 14, 2019

From a brief read, I think it reads in flight data not necessarily cached, so flushing cache won't help unfortunately.

silversconfused · on May 14, 2019

One CPU per process makes a lot more sense, especially now that we have so many specialized CPUs in our machines anyway.

rightbyte · on May 14, 2019

Ye I got a feeling that shared processor time with strangers is not viable without specialized hardware.

snovv_crash · on May 14, 2019

I think it isn't viable with non-deterministic (in time) hardware behavior. This means dedicated caches, or no caches at all. Dedicated guaranteed memory speeds and latencies. Dedicated processing units. The untrusted code cannot be affected by other code, otherwise the other code leaks its usage patterns across.

ivl · on May 14, 2019

Decade and a half, even. If I remember right, the first CVE for an HT security flaw was summer 2005.

cperciva · on May 14, 2019

I announced it publicly 14 years ago yesterday.

mmastrac · on May 15, 2019

This one? https://nvd.nist.gov/vuln/detail/CVE-2005-0109

Dang: "Hyper-Threading technology, as used in FreeBSD and other operating systems that are run on Intel Pentium and other processors, allows local users to use a malicious thread to create covert channels, monitor the execution of other threads, and obtain sensitive information such as cryptographic keys, via a timing attack on memory cache misses."

Also, found elsewhere:

"According to Linus Torvalds and others on linux-kernel this is a theoretical attack, paranoid people should disable hyper threading"

cperciva · on May 15, 2019

Yes. Intel dismissed it at the time, saying that "nobody would ever have untrusted code running on the same hardware on which cryptographic operations are performed".

smnra · on May 14, 2019

30% performance hit? I'm sure that heavily depends on the workload... and I'm also sure you lose performance when HT is on, depending on the workload as well.

silversconfused · on May 14, 2019

I've seen this claim made for routers and other low intensity low latency workloads.

penagwin · on May 14, 2019

That would make sense, my understanding is that with a 100% pegged CPU hyperthreading won't be super beneficial as they aren't real cores, just smarter scheduling. You can't really schedule 100% load better, however for applications that are latency specific it would make more sense, as you don't have the CPU pegged, you just want a faster response.

api · on May 14, 2019

> You can't really schedule 100% load better

Sure you can. You can do math while another HT is waiting for memory. Sometimes you can even multiplex use of multiple ALUs or one HT can do integer and another can do floating point.

It's actually under high multithreaded load that HT shines, especially if that load is heterogenous or memory latency bound.

ajdlinux · on May 15, 2019

I too was once under the misapprehension that HT was "just smarter scheduling", until I took a university course in microarchitecture that explained how Simultaneous Multithreading actually works in terms of maximising utilisation of various types of execution units. I wonder why "smarter scheduling" became a common understanding.

astrange · on May 15, 2019

Wouldn't hyperthreading also be more power-efficient compared to running a second core?

jandrewrogers · on May 14, 2019

Disabling hyper-threading is highly unlikely to produce a 30% performance hit. Most highly optimized software disables or avoids hyper-threading because doing so increases performance.

Hyper-threading tends to benefit the performance of applications that have not been optimized, and therefore presumably are also not particularly performance sensitive in any case.

berkut · on May 14, 2019

In highly-parallel workloads like rendering (ray tracing) where pipeline stalls due to loads happen quite regularly, it's fairly easy to get 20-35% speedups with HT.

jcelerier · on May 14, 2019

In music production and C++ code compilation I get a pretty reliable +25% perf boost with HT on (this was not the case a few gens ago though).