I wonder at what point the hardware fix for these issues stop becoming worthwhil...

saltcured · on May 14, 2019

Ironically, a high performance, general purpose architecture without speculative execution might require a deep reinvestment in SMT. Instead of trying to speculatively make one thread fast to mask IO stalls, run a large pool of threads that can stall frequently but still keep the execution units and memory channels busy.

To avoid reintroducing these spectre like bugs, you'd have to conservatively design the per-thread execution to avoid those covert channels. Not only synchronously enforcing all logical ISA guarantees for paging and other exception states, but also using more heavy-handed tagging methods to partition TLB, cache, etc. for separate protection domains.

jdmichal · on May 14, 2019

AKA a barrel processor:

https://en.wikipedia.org/wiki/Barrel_processor

gpderetta · on May 14, 2019

SUN tried the flock of (in-order) chickens in the past, targeting JVM workloads. It wasn't great.

On the other hand GPUs are pretty much what are you describing: excellent for some specific workloads, terrible for others.

cesarb · on May 14, 2019

> Instead of trying to speculatively make one thread fast to mask IO stalls, run a large pool of threads that can stall frequently but still keep the execution units and memory channels busy.

Isn't something like that done for GPUs? They have the advantage of having a massive number of threads to execute. For CPUs, the number of runnable threads tends to be lower.

pcwalton · on May 14, 2019

You can kiss any semblance of reasonable performance goodbye if you eliminate "speculative execution". Pipelining is the most basic tool in the toolbox. Even microcontrollers do it.

umanwizard · on May 14, 2019

To give ballpark numbers: modern Intel processors can retire a few instructions per cycle in tight loops (4 is the theoretical maximum; > 2 is realistic in a lot of high-performance code). A branch misprediction wastes 10-15 cycles.

So getting rid of speculation entirely, and stalling on every branch, would waste time equivalent to dozens of instructions. On typical code that has a branch every few instructions, this could slow down execution by several times.

ccozan · on May 14, 2019

Can we actually compute without branching? Genuine question.

What architecture would do that, and how?

gmueckl · on May 14, 2019

The simplified SIMD cores in early GPUs had to fake branching to some extent for their virtual threads: every branch in the shader code would be tested for each virtual flag and that thread (really just a vector component) would be masked out for the instructions of the branch that didn't apply. The GPU would run both branches, relying on the mask. It was workable, but very slow.

pcwalton · on May 14, 2019

Old GPUs did that. It wasn't very pleasant to program with. :)

nwallin · on May 15, 2019

You can compute with less than that. (all links are to the same thing)

https://github.com/xoreaxeaxeax/movfuscator

https://m.youtube.com/watch?v=R7EEoWg6Ekk

https://news.ycombinator.com/item?id=18991404

deathanatos · on May 14, 2019

Pipelining isn't strictly the same thing as speculation, though, is it? If I have,

  add %rax, %rbx
  add %rcx, %rdx

I can pipeline those without needing to speculate on anything. If there is a dependency on a previous instruction, then we might have to speculate, but hopefully there is still some case for pipelining?

Have any of these bugs been completely based on speculation, or is it always speculating across privilege boundaries? (Although I feel like even the former isn't same, e.g., if you're in some form of VM attempting to maintain privilege separations.)

cperciva · on May 14, 2019

It's related. If you want decent performance with pipelining, you're going to want to speculate at least a bit -- assume that FP math doesn't trigger exceptions, assume that you predicted branches correctly, assume that memory accesses don't fault, etc.

Intel does more speculation, but you won't find anything beyond the tiniest embedded CPUs which don't do any.

TomVDB · on May 14, 2019

If ARM and AMD CPUs are not affected, then speculative execution in general is not the issue.

dfrage · on May 14, 2019

But they are, ARM to both a Meltdown variant and Spectre, as well IBM's POWER and mainframe chips, and AMD to Spectre.

TomVDB · on May 14, 2019

They are not for all kind of attacks.

E.g. this new one can only be reproduced on Intel, not on AMD and ARM.

If you want to ban speculative execution for everything, you need to make the case that it's a fundamental issue and not an implementation specific issue.

Right now, that's not the case for many of these vulnerabilities.

dfrage · on May 14, 2019

As I understand it, the Intel only vulnerabilities, Foreshadow/L1TF, and this set which I've not looked at the details of yet, are targeting specific Intel features, and there's no reason to believe a similar focus on the other companies' products wouldn't also find unique problems.

For example, the first version of Foreshadow went after the SGX enclave. Given how widespread Meltdown and Spectre bugs are, there's absolutely no reason to believe that the other vendors don't have similar unique problems.

makomk · on May 15, 2019

As you say, only the first Foreshadow attack went after SGX - it turned out to be a broader flaw that also affected OS page table protections more generally and could be used to attack process-OS and VM-hypervisor isolation. Those variants relied only on Intel's implementation of standard x86 paging, and they don't exist on AMD because they didn't implement it in the flawed way Intel did. That is, Foreshadow/L1TF is Intel-only not because it relies on an Intel-only feature, but because it's an Intel-specific implementation flaw. (Linux had to substantially rework its paging code to work around this.)

AMD don't seem to have commented on ZombieLoad yet, presumably because it's much newer and they didn't have pre-announcement info about it, but they've commented on the other two vulnerabilities announced today and explained that the reason they're not vulnerable is because the corresponding units in their CPUs don't allow speculative data access unless the access checks pass and their whitepaper seems to suggest the same is true of ZombieLoad: https://www.amd.com/system/files/documents/security-whitepap...

SGX does make for an easier and flashier demo for Foreshadow, though, so it makes sense that the researchers went after that target. They managed to recover the top-level SGX keys that all SGX security and encryption on the system relies on, something that I don't think anyone had ever managed before.

Also, as I've said elsewhere, Intel seems to speculatively leak data that shouldn't be accessible pretty much everywhere in their designs where memory is accessed.

ct520 · on May 15, 2019

" and there's no reason to believe a similar focus on the other companies' products wouldn't also find unique problems."

Sure there is. Just like the first round last year, intel totally through AMD under the bus to save face and stock price. That is the reason to mention AMD literally, to keep their stock price from crashing.

rini17 · on May 14, 2019

The industry will probably get dragged, kicking and screaming, into using tagged pointers. CPU could then use the information to put safe lid on speculative execution.

And it will be tough as no compiler supports it, moreover C/C++ are architected from the beginnign to not bother with runtime information about object types/sizes.

effie · on May 15, 2019

That sounds interesting. Is there some analysis out there which shows tagged pointers to be superior to the status quo?

pas · on May 17, 2019

Ultimately, if we do transparent per process memory encryption, then we can let the CPU do all the speculation it wants, but the result will be gibberish. And it's a lot easier to do a simple key switch than doing a full TLB flush. Of course, it probably doesn't do much against/for the timing attacks (side channels).

egwor · on May 14, 2019

Do these really matter if you're running a compute farm? If you're already locked down couldn't you use that as a way to mitigate risks?

knd775 · on May 14, 2019

I can't imagine that it would. As long as no one else can execute code on your CPUs, you should be safe

gmueckl · on May 14, 2019

My guess is that the performance loss from removing these features would make such CPUs less economical than strictly enforced separation between security domains on a hardware assignment and scheduling level. That is, just forget about having the same server run stuff from different contexts at the same time.

shrimp_emoji · on May 14, 2019

Or open source RISC-Vs

silversconfused · on May 14, 2019

At hyper scale? Yes please!