V8: A Year with Spectre

samth · on April 23, 2019

Like many with a background in programming languages and their implementations, the idea that safe languages enforce a proper abstraction boundary, not allowing well-typed programs to read arbitrary memory, has been a guarantee upon which our mental models have been built. It is a depressing conclusion that our models were wrong — this guarantee is not true on today’s hardware. Of course, we still believe that safe languages have great engineering benefits and will continue to be the basis for the future, but… on today’s hardware they leak a little.

We should all be more angry at chip makers for this. Intel isn't willing to admit that they broke things because then they'd have to fix it, but we shouldn't accept that kind of approach.

rayiner · on April 23, 2019

Chip makers never “broke” anything. Resistance to side channel attacks was never part of the protection model. Remember, these protection models were designed when if an attacker was running code in your address space things had already gone completely sideways. To the extent anyone is at fault, it’s the folks who designed browsers with in-process JS engines without realizing that they were assuming the hardware was providing protections that the hardware didn’t claim to provide.

pcwalton · on April 23, 2019

CPU designers have had to deal with in-process isolation for a very long time. The earliest citation I can find is Berkeley Packet Filter in 1987 [1], before the 486 and just two years after the 386 memory protection debuted. If you were to go back in time and ask Intel's chip designers whether they intended to support BPF well, I'm sure they would say yes. Software fault isolation in 1993 (seminal paper at [2]) built on those techniques.

Spectre is simply a subtle oversight in the way different pieces of the system interact.

[1]: https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-87-2.pdf

[2]: https://homes.cs.washington.edu/~tom/pubs/sfi.pdf

tedunangst · on April 23, 2019

Note that BPF doesn't attempt to provide much of a security barrier.

> This access control mechanism does not in itself protect against malicious or erroneous processes attempting to divert packets; it only works when processes play by the rules. In the research environment for which the packet filter was developed, this has not been a problem, especially since there are many other ways to eavesdrop on an Ethernet.

And even on modern systems, you need to be root to install a packet filter, which is typically also sufficient privilege to simply open /dev/mem and read kernel memory directly. (Or you did, then people started using bpf for everything, but that came many years after the 386 and 486.)

I don't think BPF is a good example of running untrusted code, at least not the early versions, since it wasn't untrusted.

rayiner · on April 23, 2019

Right, I would contend that the threat model addressed by BPF is preventing trusted but buggy code from taking down the kernel, not protecting against malicious code.

pcwalton · on April 23, 2019

Regardless of whether it's 1987 or 1993 that you want to date the beginning of SFI to, it's certainly the case that SFI is explicitly designed to protect against malicious code. CPU designers have had a long time to deal with that, and you didn't see them telling people not to do it back then.

johncolanduoni · on April 24, 2019

seccomp-bpf might be a better example than bpf as packet filters, since unprivileged processes are allowed to provide the bpf code.

pron · on April 23, 2019

> Spectre is simply a subtle oversight in the way different pieces of the system interact.

Yes in the sense that I assume chip makers didn't foresee it and don't like its ramifications, but it's also one that's essential to how the chips currently give their users the performance they want.

bogomipz · on April 23, 2019

Great links, SFI is basically the technique behind NaCl(Native Client) and the Go Playground no? I didn't realize this technique was this old.

jnordwick · on April 23, 2019

Why do comments like this get rated down so quickly. The process boundary was supposed to be the level of protection. Preventing your own process from accessing itself was never part of the memory model.

monocasa · on April 23, 2019

Chip manufacturers have known about JavaScript and it's implementations for decades.

ARM for instance had gone so far as to make JS specific instructions (FJCVTZS, Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero), and before that they had instructions to help a JIT (ThumbEE).

Not sure why we're buying the chip companies' shtick of "oh, poor us, we never knew people would use our chips like that, we just specifically optimized for it and provided support instructions"

kllrnohj · on April 23, 2019

> Not sure why we're buying the chip companies' shtick of "oh, poor us, we never knew people would use our chips like that, we just specifically optimized for it and provided support instructions"

Hardware moves slowly and can't be fixed easily. Browsers should just be doing this (which they are with site isolation) instead of hardware trying to hack in patches to cover up bad software architecture.

And you can't fix a bug you didn't know about anyway, so why would you have expected ARM to fix something nobody knew was real in the first place?

monocasa · on April 23, 2019

Running untrusted code in the same process isn't "bad software architecture". And we've been doing that for as long as microprocessors have had OoOE. Java got big right when the first consumer OoOE chips came out in the 90s.

stubish · on April 24, 2019

The JVM took responsibility for sandbox isolation. It took until Spectre/Meltdown to widely demonstrate that this was a poor decision, because it turned out in process sandbox isolation is a promise that cannot be kept. And the point of the JVM was to run the same code on all sorts of hardware, so it doesn't get to blame the Intel or Spark or Motorola CPU it happens to be running on.

kllrnohj · on April 23, 2019

> And we've been doing that for as long as microprocessors have had OoOE

Prior to web browsers when was such a thing ever widespread? And if you eliminate web browsers from the picture, how many usages are even left?

> Java got big right when the first consumer OoOE chips came out in the 90s.

Java doesn't do this, so how is that relevant?

monocasa · on April 23, 2019

> Prior to web browsers when was such a thing ever widespread? And if you eliminate web browsers from the picture, how many usages are even left?

BPF has existed since the mid 90s for one example.

> Java doesn't do this, so how is that relevant?

Java and the client web were next to inseparable concepts at the time. Java ran their VM as a shared library in the browser process for applets.

rayiner · on April 23, 2019

So chip makers should have redesigned their hardware to match the erroneous assumptions JS vendors were making about how memory protection works?

int_19h · on April 23, 2019

If they were erroneous, why weren't they corrected for two decades by those very same chip makers? It was various security researchers that first warned about something like this, not an Intel engineer saying "y'all are holding it wrong".

monocasa · on April 23, 2019

They should have provided mechanisms for multiple memory domains in the same process, so that people can use their chips securely to do the work the expect to do with them, yes.

rayiner · on April 23, 2019

Got it. Chip makers have an obligation to accommodate (proactively, no less) stupid things developers do, like running untrusted code in the same process. Makes total sense.

dang · on April 25, 2019

"Be kind. Don't be snarky."

https://news.ycombinator.com/newsguidelines.html

pcwalton · on April 23, 2019

You're trying to casually brush away the entire field of software fault isolation as "stupid things developers do". In fact, SFI has been a respectable area of systems research since at least 1987.

rayiner · on April 23, 2019

I'm not trying to casually brush away the whole field. MMU-based protection has been overly limiting ever since MMUs were developed. That doesn't mean that every design for providing isolation beyond what the MMU can guarantee is well-thought-out. Certainly, I see no basis for blaming chip makers for the fact that VM developers came up with an attempt at same-process isolation that doesn't work.

It's like all the whining people do about GCC doing unexpected things when faced with code that relies on undefined behavior. That's not GCC's fault.

pcwalton · on April 23, 2019

If hardware architects had intended not to support software fault isolation, then they would have said so back when the field was developed. It's not like people with experience in hardware design weren't in the peer review circles for those papers. Steve Lucco, one of the authors of the 1993 paper, went on to work at Transmeta.

This isn't like GCC, where the C standards bodies officially got together and said "don't do this".

monocasa · on April 23, 2019

> Certainly, I see no basis for blaming chip makers for the fact that VM developers came up with an attempt at same-process isolation that doesn't work.

The issue isn't that there's a bug in their VM implementation, it's that with current hardware general VMs and same process isolation are mutually exclusive.

monocasa · on April 23, 2019

They knew since the 90s that people were doing this and expecting it to work, or did you miss that whole Java thing? Java became big as Intel was adding OoOE to their cores.

Hardware and software is codesigned, and yes the onus is on the chip manufacturers to release chips that let you continue to use them securely.

tedunangst · on April 23, 2019

Why don't software developers have the responsibility to write software that uses the existing hardware protection mechanism?

monocasa · on April 23, 2019

Because chip manufacturers changed their hardware after the fact and kept their changes proprietary.

earenndil · on April 23, 2019

Not stupid things developers do. Chip makers should accommodate the behaviour of most users of their chips, which does include running javascript.

_pd19 · on April 23, 2019

Meltdown let processes access kernel memory which was supposed to be hardware protected. That is a violation of the chipmakers obligation, not the software obligation.

tntn · on April 23, 2019

... yeah.

And that is a different discussion. The article, and the discussion here, is regarding side channels attacks within a single process. I'm pretty sure everyone agrees that the hardware (or some conspiracy of the hardware and kernel) must provide process isolation.

comex · on April 24, 2019

How do Spectre attacks on kernels and on other processes fit into your model? Mitigating them has required hacks such as Linux's array_index_nospec(), plus newly added microcode features such as IBRS, STIBP, and SSBD, all of which cause significant slowdowns – to such an extent that SSBD at least is off by default in most OSes – yet even enabling all mitigations does not completely prevent Spectre attacks. And there's nothing about the design of modern kernels that's changed in the last several decades to make them more inherently susceptible to Spectre attacks, so the issue was already there when the protection model was designed.

(There is something in Meltdown's case: not flushing the TLB on every context switch is relatively new. But it's directly encouraged by CPU manufacturers via the ASID feature. And Meltdown is a narrower bug anyway, something that's easy to fix in a new hardware design, not like Spectre which is more fundamental to the concept of speculative execution.)

TheSoftwareGuy · on April 23, 2019

Everybody who studied even a little bit about processor hardware new about speculative execution. And it was never just intel that was using it, (they all were). Until Spectre that technology was not considered controversial.

the crazy thing is that nobody saw this until recently.

mrfredward · on April 23, 2019

For anyone who hasn't come across it, here's a really interesting blog post about speculative execution and a cache bug in the Xbox360 (2018 post about stuff that happened in 2005):

https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...

orbital-decay · on April 23, 2019

> the crazy thing is that nobody saw this until recently.

Correct me if I'm wrong, but speculative execution attacks (or at least the possibility) were known for several years before Spectre.

_pd19 · on April 23, 2019

You're not wrong - side channel attacks have been around for forever

gpderetta · on April 23, 2019

Not all side channel attacks rely on speculation, although I think all known speculation attacks necessarily rely on side channels to exfiltrate information. I'm not an expert but I think that specifically attacking speculation was novel.

monocasa · on April 23, 2019

Theo de Raadt was complaining about the possibility 12 years ago. https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

gpderetta · on April 24, 2019

I don't see any mention of speculation there. Any further pointers?

wglb · on April 23, 2019

This is one of the discoveries that you see then slap your forehead saying "Duh of course!"

samth · on April 23, 2019

"Until event X, doing <thing that X demonstrated was bad> was not considered controversial" is an explanation of behavior, but not really a defense.

wildmusings · on April 23, 2019

Negligence, recklessness, knowledge, or purposefulness are probably the only way actions can be wrongful. To have acted negligently, you at least ought to have known that it carried an unacceptable risk. So you need a story on how it was at least negligent and why they ought to have known the risk before releasing the product.[0]

Mere causation can’t get you there. E.g. when a car hits a pedestrian, the driver and pedestrian equally “caused” the accident. It is only by way of characterizing their behaviors in one of the ways above that we can identify wrongdoing. Perhaps the driver wasn’t paying attention (negligent or reckless) and ran a red light. Or perhaps the pedestrian was intentionally throwing themselves in front of traffic. Etc.

[0] Products liability law on its surface does eschew the moral-wrongdoing requirement in favor of strict liability for some kinds of product defects. But that has to do with economic incentives, practical ability to prove claims, etc.

rayiner · on April 23, 2019

Here, chip makers never promised to prevent X. Maybe preventing X is desirable now that people do Y, but you can hardly blame them for not preventing something they didn’t promise to prevent.

shawnz · on April 23, 2019

They promised to develop general purpose chips which can meet as many desktop computing needs as possible, which now implicitly includes need Y (but they didn't anticipate that at the time).

They could of course just reject the necessity of need Y, but if the majority of their clients actually do have need Y, can it really be said that the chip is successful at being general purpose?

rat9988 · on April 23, 2019

Yeah you can. Because the chip provides capabilities so you can do such protections in software if you want, or get more speed if you don't want.

marcosdumay · on April 23, 2019

You say that like if Intel flaws were comparable to the ones from ARM and AMD. They aren't.

kllrnohj · on April 23, 2019

And this post isn't about those flaws, so that's irrelevant.

gcb0 · on April 23, 2019

that's victim blaming.

chip consumers missing a communication is very different from intel actively developing this to cut corners for raw performance (which is the only reason they cornered the market) and forcing all other manufacturers to follow up or die.

jcranmer · on April 23, 2019

Speculative execution was developed by IBM in the 1960s, before Intel made CPUs.

monocasa · on April 23, 2019

To be fair, untrusted code wasn't part of the security model for mainframes for the longest time.

tptacek · on April 23, 2019

Wait, wasn't it more a part of their model than for Intel? We're talking about the era of mandatory access control and time sharing.

monocasa · on April 23, 2019

I mean, the 360/91 that he's talking about didn't even have an MMU.

wglb · on April 23, 2019

Wait--weren't MMUs available back even on the 65/67?

monocasa · on April 23, 2019

Only the 67 for the 360s, and there the software hadn't really caught up. On the 370s it was more prevalent, but they had gone back to in order designs at that point.

wglb · on April 23, 2019

I don't think this is really correct. From August 1965, multiprogramming was offered using DAT (Dynamic Address Translation)--what they called MMUs back then.

And this feature was used to support not only many programs running at once, but also TSO (Time Sharing Option) which by definition is mutually untrusted code.

monocasa · on April 23, 2019

TSO didn't depend on hardware boundaries between clients.

bepvte · on April 23, 2019

Technically, the phrase victim blaming fits this sentence, but I feel like this context is not its intended or commonly accepted use

tus87 · on April 23, 2019

It's more depressing that some people think of programming as programming a language rather than a machine. Languages are just fluff that all reduce to the same thing when crunched through a compiler or interpreter.

johncolanduoni · on April 24, 2019

Writing your code in assembly wouldn’t have isolated you from any of these issues; in fact it would make deploying the kinds of mitigations mentioned in the article drastically more difficult. And even if you’re actively thinking about micro-architectural concerns, it is evident that these issues are not obvious, since it took a long time for anybody to put their finger on a concrete problem with speculative execution.

bitwize · on April 23, 2019

Chip makers favored speed over security to get a leg up in the megahertz wars. They may have even been aware of the risks, but at the time thought they were obscure and difficult to exploit. They made sound decisions at the time, which makes Meltdown and Spectre even scarier, because what sound compromises made since then will come back to bite us in the ass later?

Sometimes I think that humanity is not ready for computers, and it's time to go full-on Butlerian jihad.

jcranmer · on April 23, 2019

When Spectre first came out, my prediction was basically that the only effective mitigation would be to move the untrusted code into another address space. This post reinforces that they have exhausted all other options:

> Our research reached the conclusion that, in principle, untrusted code can read a process’s entire address space using Spectre and side channels. Software mitigations reduce the effectiveness of many potential gadgets, but are not efficient or comprehensive. The only effective mitigation is to move sensitive data out of the process’s address space.

Probably the most effective hardware mitigation would be to shift the isolation to a page-level granularity, so that you could say that speculation is disabled for memory in specific pages.

kllrnohj · on April 23, 2019

> Probably the most effective hardware mitigation would be to shift the isolation to a page-level granularity, so that you could say that speculation is disabled for memory in specific pages.

What's wrong with the hardware mitigation of just using multiple processes? It already exists, it already works, and it already has decades of tooling & infrastructure built around it.

sievebrain · on April 23, 2019

It's very expensive. Site isolation in Chrome explodes memory pressure on the system. Language enforced confidentiality is a lot less resource intensive.

paulddraper · on April 23, 2019

> Site isolation in Chrome explodes memory pressure on the system.

Is that really due to multiple processes though? Isn't the overhead for a new process reasonably low, like <2MB?

kllrnohj · on April 23, 2019

> It's very expensive.

Processes? They don't need to be expensive, as proven by Linux. Trivially cheap enough to do one per site origin in a browser, anyway.

> Site isolation in Chrome explodes memory pressure on the system.

How do you figure? Code pages are shared, after all. Only duplicate heap would be an issue, but shared memory exists and can mitigate that if there's read-only data to be shared.

So what memory pressure is "exploded"?

> Language enforced confidentiality is a lot less resource intensive.

Not at all clear-cut or self-supporting. What resource(s) is it less intensive on, and what are you using to support such a claim?

CPU time is a resource, too, after all. All this software-injected mitigations and maskings aren't free.

bzbarsky · on April 23, 2019

> Code pages are shared, after all

Yes, but various things that require relocations may not be. That can include code, but definitely includes data like C++ vtables, as a simple example. Just to put a number to this, for Firefox that is several megabytes per process for vtables, after some work aimed at reducing the number.

There are ways to deal with that by using embryo processes and forking (hence after relocations) instead of starting the process directly; you end up with slightly less effective ASLR, since the forking happens after ASLR.

> So what memory pressure is "exploded"?

Caches, say. Again as a concrete example, on Mac the OS font library (CoreText) has a multi-megabyte glyph cache that is per-process. Assuming you do your text painting directly in the web renderer process (which is an assumption that is getting revisited as a result of this problem), you now end up with multiple copies of this glyph cache. And since it's in a system library, you can't easily share it (even if you ignore the complication about it not being readonly data, since it's a cache).

Just to make the numbers clear, the number of distinct origins on a "typical" web page is in the dozens because of all the ads. So a 3MB per-process overhead corresponds to something like an extra 100MB of RAM usage...

kevingadd · on April 23, 2019

The experience of literally every browser vendor does not support your claim that it is 'trivially cheap'.

Problems worth thinking about:

Sharing jitted code across processes (including runtime-shared things like standard library APIs) - lots of effort has gone into optimizing this for V8.

Startup time due to unsharable data. Again lots of effort goes into optimizing this.

Cost of page tables per process. (This is bad on every OS I know of even if it's cheaper on some OSes).

Cost of setting up process state like page tables (perhaps small, but still not free)

Cost of context switches. For browsers with aggressive process isolation this can be a lot.

Cost of fetching content from disk cache into per-process in-memory cache. This used to be very significant in Chrome, they did something recent to optimize it. We're talking 10-40 ms per request from context switches and RPC.

Most importantly the risk of having processes OOM killed is significant and goes up the more processes you have. This is especially bad on Android and iOS but can be an issue on Linux too.

ASLR and other security mitigations also mean you're touching some pages to do relocation at startup, aren't you? You're paying that for dozens of processes now.

kllrnohj · on April 23, 2019

Those are all costs of doing multi-process at all. Once you've committed to that (which every browser vendor did long before spectre was a thing), doing it per site-origin doesn't significantly change things.

As for the actual problems, many of those are very solvable. Startup time, for example, can be nearly entirely eliminated on OS's with fork() (and those that don't have a fork need to hurry up and get one) - a trick Android leverages heavily.

And a round-trip IPC is not 10-40ms, it's more like 20-50us ( https://chromium.googlesource.com/chromium/src/+/master/mojo... )

> Most importantly the risk of having processes OOM killed is significant and goes up the more processes you have. This is especially bad on Android

There's no significant risk here on Android. Bound services will have the same OOM adj as the binding process.

kevingadd · on April 24, 2019

Cache round-trips in chrome were historically 10-40 ms, you could see it in devtools. I have old profiles. You're thinking the optimal cost of a round-trip, not the actual cost of routing 1mb assets over an IPC pipe

johncolanduoni · on April 23, 2019

Using multiple processes communicating via IPC is quite possible, but requires a significant restructuring for applications that are used to doing simple function calls between code operating in different privilege domains.

Chromium already had a lot of this infrastructure built up, and it was still a significant task to split processes up further by origin, on top of the splitting by tab chromium always had.

monocasa · on April 23, 2019

I think the most effective hardware mitigation would be to bring back rings 1 and 2. Don't disable speculation, but instead give multiple hardware security contexts that get protected the same way that meltdown gets fixed (or wasn't a problem to begin with depending on the chip).

gpderetta · on April 23, 2019

Virtualization hardware can potentially be accessed from userspace [1]. That might be both more fine grained and more performant than a fully separate address space.

Failing that maybe it is time to bring back segments.

[1]https://github.com/ramonza/dune/blob/master/README

monocasa · on April 23, 2019

I did a similar thing here, using the a virtualized ring 0 context speed up JITed cross ISA virtualization.

https://github.com/monocasa/remu-playground

Because you have to round trip through the host kernel to do anything that leaves that context, it's a bit slower and doesn't really get you the perf gains overall you might think. : /

But yeah, segments coming back would be really neat.

gpderetta · on April 23, 2019

Nice.

Intel also support memory protection keys. I wonder if they would be enough. Although they might be affected by meltdown like issues.

Does AMD support them?

monocasa · on April 23, 2019

The MPX extensions? AMD never supported them, and I've heard rumors that Intel might be dropping support since support never really took off for it. In addition to the meltdown like effects that probably exist, I'd be afraid of the cache effect side channels of just everything needed to load the base and bounds registers.

gpderetta · on April 23, 2019

Not MPX, I think it is called MPK[1]; basically a process can tag each page with a 4 bit identifier (via a kernel provided API) and then, by loading the id on a special register can restrict its own accesses to pages with different keys.

[1] https://lwn.net/Articles/689395/

monocasa · on April 23, 2019

Oh, neat! Thanks! I had somehow missed that there was a tagged memory extension to x86; I'm going to have to check that out.

kccqzy · on April 23, 2019

And now suddenly your browser needs root and needs to run in kernel mode?

monocasa · on April 23, 2019

...no, the kernel would provide safe access to those.

Aearnus · on April 23, 2019

The attack surface of the kernel is much larger than the attack surface of a browser. Not saying this is a bad idea, but it definitely opens up much more room for much more serious attacks.

monocasa · on April 23, 2019

The browser already mitigates this via the normal system call sandboxing on the various OSes.

And something like this already existed, 32bit chrome would play games with the LDT on OSes that allowed that to build a better sandbox. It's just that the LDT more or less disappeared on x86_64.

bluGill · on April 23, 2019

Ring 1 and 2, while rarely used on x86 have meaning that is not appropriate for what is needed here.

What I want is a new ring-4. Ring-4 is like ring-3, except that cache attacks are mitigated (disable speculative execution is the obvious option, but there might be something else that I don't know of). Ring-3 code can put parts of itself in ring-4 mode as required.

monocasa · on April 23, 2019

If you just want to disable speculative execution, you can already do that. Just emit an lfence in front of every load. Of course, you don't want to do that because it'd cut into your execution speed by 10x to 100x. What you want is the ability to keep speculative execution, but not allow the JITed code to speculate based on information in the same process that it shouldn't have access to. To do that requires describing the different security zones to the processor somehow.

Why I say that bringing back rings one and two is a good idea is that it gives another context to user mode _and_ kernel mode, so that the kernel can protect itself from BPF style programs at some point.

bluGill · on April 23, 2019

That is why I said ring-4 mitigates the attacks.

Ring 1 and 2 already exists on x86, we can't just reassign what they do, which is why I introduced ring-4. Ring-1 and Ring-2 are too powerful for user programs, the web browser should not use them.

monocasa · on April 23, 2019

Sure you can, it just needs to be opt in. And since x86_64 neutered rings 1 and 2 to the point that they don't really make sense to use anyway, it shouldn't be a big deal for downstream software either.

reitzensteinm · on April 23, 2019

If you're going to modify the processor to add a feature to prevent speculative reads from the same address space when you control the JIT, why not add an instruction that enters a mode that passes the address of all reads and writes through an implicit mask until it has been switched off.

This would outright prevent code generated by the JIT to do out of bounds speculative loads directly, it would just load a rewritten address from within the restricted region. Instead you'd have to trick an API call (which would be executed without the restriction) in to doing it for you.

an_opabinia · on April 23, 2019

Why not create a chip that accelerates compilation? Or like those old Java cores on phones, accelerates the web browser's parsing, compilation and code execution needs?

We already ship H.264 decoders on everything. And vendors are sort of settling on, Chrome/WebKit rules all. Why not just standardize the model in the hardware?

On the other hand, image and video codecs are actually chock full of security issues nobody bothers to exploit.

verall · on April 23, 2019

Video processing is generally a parallel bandwidth-heavy task with predictable memory access patterns whereas compilation is NOT. Thus video processing lends well to hardware implementations. Additionally everything h.265/VP8 or later was designed with people from HW companies on the committees and thus the algorithms were adjusted specifically to benefit HW impl. Compilation is extremely configurable, Chrome wants to push out updates to its JS parser every month, there are like a million reasons this isnt done in hardware.

monocasa · on April 23, 2019

Hardware isn't magical go faster juice. Doing compilation 'in hardware' isn't any faster than doing it in software.

jasonhansel · on April 24, 2019

Is it actually possible to enforce this degree of isolation in all cases? It seems like you need to separate everything influenced by user input from anything containing secrets.

zzzcpan · on April 23, 2019

Note, they talk about retrofitting mitigations into the web browser running javascript, not in general. It's hard to find a reason why software mitigations wouldn't work for a language and a compiler designed from scratch for it.

reubenmorais · on April 23, 2019

Mitigating Spectre only makes sense if you’re doing it in a VM (e.g. operating system, web browser, processor), a system that loads and executes untrusted code. Not in a language or a compiler. If your language/compiler doesn’t allow the coder to exploit Spectre, they’ll just use another language/compiler.

zzzcpan · on April 23, 2019

The problem is much bigger than that. Today pretty much all mainstream languages rely on untrusted modules ecosystems, untrusted github projects, etc. If we were to create language level boundaries to address those problems, we would need to deal with Spectre as well.

Either way, my point was that it's absolutely possible to fully mitigate Spectre in software. A particular group of people just found it too hard to do for the things they work on. Doesn't mean the same applies to anything else.

0x0 · on April 23, 2019

The linked paper contains a quote and a reference to Microsoft Singularity that really is an interesting thought experiment. Imagine we were all running next-gen systems designed like Singularity by now, which relies on type safety and managed code instead of memory protection to provide application level security. Spectre would have been a lot more devastating then, if we couldn't have hardware-assisted virtual memory protection separating processes from each other?

https://en.wikipedia.org/wiki/Singularity_(operating_system)

MrZipf · on April 23, 2019

Mark Aiken wrote a paper on Singularity with and and without Memory Protection:

https://www.microsoft.com/en-us/research/publication/deconst...

By design, Singularity didn't support dynamic code loading so untrusted code would run in another software isolated process (SIP) and separated by a channel boundary (IPC). With Spectre, you'd need to rethink what happens with IPC to and fro the untrusted processes. The core of the system wouldn't need this though.

Singularity also looked to proof carrying code as a way of building reliable systems. Unfortunately, it'd be hard to prove there isn't a Spectre style attack lurking in a piece of code.

monocasa · on April 23, 2019

Even if the untrusted code is in another SIP, if it was susceptible to Spectre you'd be able to break down that boundary.

MrZipf · on April 23, 2019

In Singularity, the system gets to decide what happens at SIP boundaries (and the kernel ABI boundary).

HIP is the full address space change, but any mitigation steps can be introduced in the IPC hand-off between SIPs (or at the kernel ABI) compiled into the untrusted process. This code is compiler controlled. The system is in ring-0 so IPC code gets full access to available instructions (such as mitigation is possible there).

Of course doing this makes channel communication in both directions slow for untrusted processes, but that is the cost of doing business with Spectre. And if someone wrote a browser that didn't use channels for talking to the JS engine, then all bets would be off.

monocasa · on April 23, 2019

With Spectre, malicious processea doesn't need to have code execution cross a SIP boundary in order to break confidentiality of other colocated SIPs. As a malicious SIP, I can just read out the rest of the hardware visible context.

MrZipf · on April 23, 2019

How does the hardware visible context get suitably updated if the confidential data in the other SIP isn't touched by execution (speculative or otherwise)? Doesn't something need to be pulling that state into the visible context?

monocasa · on April 23, 2019

As far as the hardware is concerned, the confidential information in the other SIP is already visible.

MrZipf · on April 23, 2019

This was stupid comment, long day. Totally looking at this the wrong way.

milkey_mouse · on April 23, 2019

Singularity heavily relies on capabilities. What if accessing the current wall clock time was a zealously guarded capability? I wonder what percentage of apps could function with no access to (real time) timers at all, or an extremely granular one?

I can't find a link at the moment, but I recall a paper showing even a very granular clock will suffice for Spectre exploits, albeit with lower bandwidth. Also, something else would need to be done about multithreading, as an application could always just spin up another thread counting as fast as it can to make a poor man's timer.

Legogris · on April 23, 2019

> I can't find a link at the moment, but I recall a paper showing even a very granular clock will suffice for Spectre exploits, albeit with lower bandwidth.

The linked article mentions this and the linked paper gives some references: https://gruss.cc/files/fantastictimers.pdf

Klathmon · on April 23, 2019

I may be wrong, but isn't there an almost unlimited amount of ways that you could determine wallclock time?

Any kind of networking access can get it (with enough samples, you can get some crazy precision over even the most inconsistent networks), and really any kind of I/O could be abused when combined with another exploits.

And if your permissions system doesn't allow I/O, is there really a lot that your program can do?

da_chicken · on April 23, 2019

You don't even need networking. If you can create a temp file, you can probably get the creation or modified time on the file.

monocasa · on April 23, 2019

Hell, you can make a passable timer for these purposes as long as you have two threads. No need to leave your process.

monocasa · on April 23, 2019

Singularity also let you use hardware protection in addition to SIPs. Basically it let you map a set of SIPs to a hardware domain.

eitland · on April 23, 2019

One thing that would reduce the attack surface massively is if we stopped pretending it is OK for every website under the sun to run any code they want for as long as they want just because I am reading their webpage.

Yep, you could say I'm talking about "crippling" web pages, but I'd rather phrase it as "pruning" or "removing the worst abuses".

Those who read carefully might have noticed I didn't say "kill Javascript with fire" or anything like that, and I think we could get far just by having the browsers limiting (by default) Javascript run time to 3 seconds: start at 20 seconds this summer and aim for 3 seconds next summer, because who seriously thinks web pages should need to run scripts for minutes after they have loaded?

I realize there are a couple of problems with the simplified approach above, so let me try to defuse the ones I see right away:

- This will break a number of sites: Yep. But if we really wanted and we got all browser vendors behind it we it wouldn't take long before all mainstream sites would be optimizing their js like crazy to make sure it loaded withing 20, then 19 seconds etc. That said: Good luck getting all browser vendors on board with this. (And, in my defense: I didn't say it would be easy, or even possible, only that I think it would be a good idea to do.)

- Some pages need Javascript because they load content using JS: We can reset the counter to something reasonable for certain types of user input.

- Some web pages needs Javascript for <reasons>: let them have a popup like the ones they get for location sharing etc.

jensvdh · on April 23, 2019

Do you want to go back to doing everything on the server again like the early 2000s?

Because that's what you'll end up with.

bartread · on April 23, 2019

You'll also end up with (something like) Flash making a comeback.

Nobody's going to accept a non-interactive web, or even one with crippled interactivity[1]. They only tolerated it in the 90s due to hardware and bandwidth limitations, not to mention that the web was new technology back then. That ship has long since sailed.

Even back then, technologies such as Flash arose because people - initially website owners/creators and then, of course, users - wanted more interactivity than HTML 3.2 and 4.x, and slow/limited JavaScript, could deliver.

[1] With very few exceptions, the GP being one.

eitland · on April 23, 2019

> Nobody's going to accept a non-interactive web, or even one with crippled interactivity

I don't talk about that: I talk about limiting web pages from using unreasonable amounts of client side resources.

We're not talking about a lack of interactivity. We're talking about webpages

- loading the page, then leaving the CPU alone

- not running crypto miners (at least not without asking)

- not having all the time in the world to run timing attacks

AnIdiotOnTheNet · on April 23, 2019

> You'll also end up with (something like) Flash making a comeback

Something like Flash did make a comeback and it's just called JavaScript now. Pretty much exactly the same attack surface, but at least now its a hopelessly complicated "standard"!

bartread · on April 23, 2019

> Pretty much exactly the same attack surface, but at least now its a hopelessly complicated "standard"!

I know, right? I almost miss Flash: it was certainly a lot simpler to work with. I'm not even sure about the "almost" in that sentence.

johncolanduoni · on April 24, 2019

> Pretty much exactly the same attack surface

Flash exploits were a lot more commonplace and frankly embarrassing than what you see in JS engines these days, so I have trouble seeing it as the same attacking surface.

int_19h · on April 23, 2019

I actually wonder if it would be as bad as it was, given everything we've learned since then.

Suppose we devise a protocol that would allow browsers to directly do what we use JS for today? In other words, it would send a request to the server for every interaction with the page, but the response would be a DOM diff in some well-specified standard format, which the browser would apply to "refresh" the page.

At that point it feels like the main difference would be in latency for small page updates. How common are those, and how bad is it on a typical Internet connection? I'd expect there to be enough to break stuff like custom scrollbars and other such widgets, but many people would say good riddance to those.

johncolanduoni · on April 24, 2019

Phoenix’s LiveView actually does exactly this via web sockets, to allow a purely server side framework to provide live updates. A big drawback is that this does t work as well over mobile internet connections, which is a bitter pill to swallow.

int_19h · on April 24, 2019

Very interesting, thank you! I suspect that the issue with mobile will become less and less relevant as average connection speeds grow, just like they did on the desktops.

eitland · on April 23, 2019

I call strawman, although I guess it is not intentional:

I'm quite clear in the comment you replied to that I don't want to remove all possibilities for JavaScript.

I want to reduce attack surface significantly and as a nice bonus create a better browsing experience for everyone.

Even I realize Javascript has its place for now at least:

- autocomplete

- client side validation

- complex browser side apps, including games

What I want to curb is websites that keep exercising my CPU for no good reason (for me as a user) because they:

- are poorly written

- are busily loading and reloading ads and trackers

- etc

I don't see how you go from removing the worst abuses i.e. to "everything on the server again like the early 2000s"

cmrdporcupine · on April 23, 2019

There is no reason that some of the interactive aspects that "AJAX" and the like enabled in the mid-00s couldn't have been done by extending the HTML/CSS standard, instead.

AWildC182 · on April 23, 2019

If we're going to do that, why don't we just go back to the 90s where everything is static and the points don't matter?

layoutIfNeeded · on April 23, 2019

The absolute horror!

roca · on April 23, 2019

This would make life truly hellish for Web developers: your app randomly gets terminated on client machines that are temporarily sluggish. Lots of them would just give up on the Web and write apps for other platforms.

eitland · on April 24, 2019

What kind of web pages do you think of that routinely need to run lots of JavaScript in the background without user interaction?

Or, maybe a simpler question: What web applications do you think of that routinely need to run lots of JavaScript in the background without user interaction?

Right now I can only come up with games and simulations.

neetodavid · on April 24, 2019

Off the top of my head, any media player or chat service

rmbryan · on April 23, 2019

The original paper is extraordinary: https://arxiv.org/pdf/1902.05178.pdf

mhandley · on April 23, 2019

From their abstract:

As a result of our work, we now believe that speculative vulnerabilities on today’s hardware defeat all language-enforced confidentiality with no known comprehensive software mitigations, as we have discovered that untrusted code can construct a universal read gadget to read all memory in the same address space through side-channels. In the face of this reality, we have shifted the security model of the Chrome web browser and V8 to process isolation.

Processes are pretty heavyweight as a way to perform this sort of isolation. I can't help thinking that something like sthreads and tagged memory from Andrea Bittau's wedge system would be great OS primitives to have right now.

http://www0.cs.ucl.ac.uk/staff/M.Handley/papers/wedge.pdf

RIP Andrea.

kccqzy · on April 23, 2019

> Processes are pretty heavyweight as a way to perform this sort of isolation.

I'm really thinking that, we don't really need all the features offered by processes. My current understanding is that we just need a different address space. Why can't we, for example, switch from one set of pages to another when we switch from running trusted browser code to running JIT-ed untrusted code? (Leaving of course a small piece of trampoline code mapped, like KPTI.) On a simple level, this could just be calling mprotect() at certain key locations that result in flipping a few bits in the kernel-maintained page tables. With some good design, perhaps the address for untrusted code and data can be so far away from trusted code and data that maybe just one bit flip is needed in a PML4E.

mhandley · on April 23, 2019

The wedge work did pretty much this - it allowed multiple page tables to be used within one process, and memory tags indicated which regions should be accessible to each "sthread". In this case, an sthread would run the JITed untrusted code. There are a lot of details to get right though, such as callgates between untrusted and trusted code so it's possible to call OS or process support functions. Anyway, although wedge was not intended to guard against spectre, it should do so nicely. Andrea had all this working in Linux.

kllrnohj · on April 23, 2019

> Processes are pretty heavyweight as a way to perform this sort of isolation.

How do you figure? Processes on some OS's, such as Linux, are pretty cheap. So what are you considering heavy? And how do you imagine a "process switching in userspace" type thing to not just have the same weight as real processes? What's the expensive thing you're trying to eliminate?

Ono-Sendai · on April 23, 2019

Isn't one possible solution to this problem just not to allow untrusted programs access to clocks and timer features?

ufo · on April 23, 2019

You can do these things as a mitigation but it doesn't fully solve the problem because untrusted programs can try to get timing info from other sources. For example, incrementing a counter inside an infinite while loop can give you a good estimate of time, measured in cpu clock cycles .

Ono-Sendai · on April 23, 2019

Ok yeah makes sense.

What about a pure functional language, in which a program computes a result that is a function of the input only.

In this case the only timing information usable for side-channel info leakage would be in the input to the program.

I guess the problems in this case become twofold:

* How can we determine we are not leaking any such info in the input to the program?

and

* Is a pure, functional language sufficient to do the stuff we want, or are they too limited, e.g. for use as a browser scripting language.

ufo · on April 23, 2019

Purity is not enough to prevent timing attacks. A famous example is string equality. The running time of a typical implementation is proportional to the length of the largest common prefix between the two strings. To prevent this information leak the implementation must be written to always iterate until the end.

But spectre is even worse than that as it attacks at the hardware level. If the CPU does any sort of speculative optimization, through things like caches or branch prediction buffers, then it likely can be the target of a spectre attack. You can try to add spectre mitigations to your language's compiler but, as the article discusses, this approach is an uphill battle.

bluGill · on April 23, 2019

Purity is enough, but purity is not useful. If java script was entirely pure it couldn't do any IO. You couldn't do anything with the string compare result because that requires IO to know what it is done, and that IO is not allowed.

Pure algorithms are a useful thing in programming. However pure algorithms are not useful on their own, you always need something impure to get the output out.

ufo · on April 23, 2019

I don't think this definition of purity matches what Ono-Sendai was talking about.

bluGill · on April 23, 2019

I think it is the same. However he is right when he wonders if this is enough for a browser scripting language. A browser scripting language requires some impure things to be useful.

ynniv · on April 23, 2019

That might not be possible in a single threaded environment like JavaScript.

phire · on April 23, 2019

An attacker could probably use a remote server and mesure the timing differences between http packets.

It doesn't matter if your timing source is high noise with lots of jitter, the attacker can repeat the mesurements over and over again and filter everything out.

Even if the attacker can only pull out a few bytes per second, that might be enough to leak something critical like and encryption key or an ASLR offset.

Klathmon · on April 23, 2019

If that single-threaded process can talk to another process or system, can't it gather or exfiltrate timing information through that channel?

stefan_ · on April 23, 2019

It was until they added background workers.

There is a known simple mitigation. Don't JIT random JavaScript, back to interpretation. Of course that means there is no use for V8, which is why it's not in this paper.

Klathmon · on April 23, 2019

V8 has a JIT-free mode now, and it's still pretty damn fast in real world situations. It looks like in synthetic benchmarks they saw up to 80% decrease in performance, but in real-world applications they saw as little as a 6% decrease.

https://v8.dev/blog/jitless

But even an interpreted language can still be vulnerable to Spectre attacks.

ufo · on April 23, 2019

Interpreters aren't enough either. If the interpreter contains pieces of code that are vulnerable to spectre then it can still be exploited.

ufo · on April 23, 2019

The original article talks at length about the various timer mitigations the v8 team created, and why they still think they are not sufficient.

amenghra · on April 23, 2019

You should read “Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript” (https://gruss.cc/files/fantastictimers.pdf$

Ono-Sendai · on April 23, 2019

Yeah some pretty interesting stuff there. The counting thread achieves 2ns resolution :)

bartread · on April 23, 2019

There are going to be use cases that would break, for example with the Web Audio APIs, because it would make it impossible to precisely time or trigger events.

mhandley · on April 23, 2019

The article discusses why this is insufficient.

bithavoc · on April 23, 2019

> Thus Chrome’s security model no longer assumes language-enforced confidentiality within a renderer process.

So V8 never was and never will be a silver bullet for running JS/WASM without isolation. I wonder what Edge CDN such as Cloudflare[0] and Flastly[1] are doing to isolate their functions.

[0]https://blog.cloudflare.com/cloud-computing-without-containe...

[1]https://www.fastly.com/blog/announcing-lucet-fastly-native-w...

steveklabnik · on April 23, 2019

Day 2 at Cloudflare here.

I haven't fully ingested this paper, and I'm still getting caught up on all the details of things, but https://www.infoq.com/presentations/cloudflare-v8 talks a bit about this. I gotta run right now so that's all I can say at the moment.

0815test · on April 23, 2019

Running them on in-order, speculation-free hardware would be quite sufficient, and come with additional benefits in power efficiency. The thing about Spectre or Meltdown is that there's really no case for the sorts of out-of-order cpus that raise these issues, unless you're truly bound by compute or memory bandwidth on some single-threaded task - which will never be the case for typical JS/WASM workloads on the "edge".

phire · on April 23, 2019

> Running them on in-order, speculation-free hardware would be quite sufficient.

In-order isn't a magic bullet against Spectre, they still do spectulative execution after predicting branches and they can still be vunerable. ARM have listed at least one of their in-order cores as vunerable.

To be free of all speculation you have to go back to the 486, which didn't even have branch prediction.

Besides, if you are making custom CPUs there are other options to avoid Spectre that don't require eliminating all spectulative execution.

bartread · on April 23, 2019

> which will never be the case for typical JS/WASM workloads on the "edge".

Yeah, but it's the atypical workloads that get you, and in every place I've worked there's always been at least the odd atypical workload regardless of system, product, platform, technology, target market (including internal and external).

zzzcpan · on April 23, 2019

So, small in-order CPUs will be the future of commodity distributed computing after all?

I doubt they will resort to that though. They can do other tricks, since they control the infrastructure, like turning on process isolation automatically for suspiciously behaving code.

wbl · on April 23, 2019

The SPARC Niagra thought that and failed in the marketplace. Single thread performance matters and hyperthreading is not a panacea.

antientropic · on April 23, 2019

> Like many with a background in programming languages and their implementations, the idea that safe languages enforce a proper abstraction boundary, not allowing well-typed programs to read arbitrary memory, has been a guarantee upon which our mental models have been built.

As an aside, the notion that it's very risky to rely only on the type system to enforce security boundaries is not new. In 2003, Govindavajhala and Appel [1] showed that you can break the security of the JVM through memory errors. Basically, an untrusted attacker fills memory with a specially formatted data structure such that in case of a random bit flip, with high probability, you get an integer field and a pointer field aliasing each other, allowing you to do pointer arithmetic. By contrast, it's extremely unlikely that a random bit flip allows (say) an unprivileged Unix process to get root access.

[1] https://www.cs.princeton.edu/~appel/papers/memerr.pdf

willvarfar · on April 23, 2019

If they are going with isolation instead, can we have the high-res timers and shared buffers back?

readittwice · on April 23, 2019

aren't they already back?

markdeloura · on April 23, 2019

Using Chrome 73 I was getting pretty wildly variable results back from performance.now() a week ago. Was near useless when trying to profile some game code. Apparently the timing results you get back in devtool perf tools are reliable, but that's not useful for in-game performance dashes or real-time profiling.

Causality1 · on April 23, 2019

Spectre is a great example of what effect public perception has on software development. After it's disclosure the industry absolutely scrambled to patch it because mainstream news outlets declared the sky was falling. A year in, and we have hundreds of thousands of slightly older computers with a 20% performance drop and even modern machines with a 5-10% performance drop to address a purely hypothetical threat.

Compare this to the various Intel Management Engine exploits from the past couple years which received no media attention and thus no industry attention.

bogomipz · on April 23, 2019

The article states:

>Extract the hidden state to recover the inaccessible data. For this, the attacker needs a clock of sufficient precision. (Surprisingly low-resolution clocks can be sufficient, especially with techniques such as edge thresholding.)

Might anyone have some good resources or links they could share on this "edge thresholding" technique?

norswap · on April 24, 2019

Small question: how can they know that it's a mispredicted branch that is being executed (so that they can set the poison appropriately)? If they have this knowledge, couldn't they simply drop the execution since it's a misprediction?

wyldfire · on April 23, 2019

> we productionized and deployed site isolation for as many platforms as possible by May 2018.

Is there a way to determine whether site isolation is available/enabled on my chrome platform?

MichaelMoser123 · on April 24, 2019

couldn't Intel add a per thread option to enable/disable speculative execution? This option could be disabled once a thread is about to execute just in time compiled code, Javascript would then run slower - but without the specter of Spectre..

eclipseo76 · on April 23, 2019

Does anyone know if SELinux can mitigate some of these vulnerabilities?

monocasa · on April 23, 2019

It does not.

omnifischer · on April 23, 2019

offtopic: Kudos to the author for having simple and fast loading page without any js or complicated CSS. Loads blazing fast. No tracking too.

mathias · on April 24, 2019

Thanks :) Glad to see people appreciate simplicity (and the performance that comes with it).

supermatt · on April 23, 2019

It is sad that the solution is to cripple the software. I hope that these workarounds will be relegated to compatibility issues in the future as the underlying hardware is eventually fixed.

kllrnohj · on April 23, 2019

No software was crippled here. Process isolation is arguably the way it should have always been in the first place. Hell, rebrand it as "hardware accelerated sandboxing" and now it's an obviously superior approach, right? ;)

But either way nothing was actually "crippled", and if anything the reverse is true. They are un-crippling aspects of JavaScript (like re-introducing SharedArrayBuffer).

supermatt · on April 23, 2019

Not sure why i got downvoted for this. Anyone care to enlighten me?

_pd19 · on April 23, 2019

Any solution requires "crippling the software" - process isolation increases overhead through kernel context switches

supermatt · on April 23, 2019

I suppose this is where I realise I don't actually understand the underlying issue. If it is a software issue, why is there so much blame directed to the hardware vendors?

_pd19 · on April 23, 2019

It's a hardware issue - ie. it is caused by compromises hardware made to speed up execution times.

Because we're stuck with the hardware we have, we can make software fixes - however they are going to require either slowing down or removing certain features until we get a hardware fix for the issue.

supermatt · on April 23, 2019

thank you. I read in another comment that this isn't an issue with process isolation of memory (which I originally thought), its an issue with the isolation of memory within the process - would you say that is a good summarisation? i.e. the hardware is working as designed, but the software is expecting guarantees that the hardware never promised?

int_19h · on April 23, 2019

Spectre only allows access to the process' own address space, yes. But Meltdown works across processes, so hardware manufacturers (and specifically Intel) have definitely dropped the ball there, as well.

gpderetta · on April 23, 2019

But meltdown is known to be fixable without significant performance costs (proof being the existing high performance CPUs that are immune to it).

Spectre not so much.

_pd19 · on April 23, 2019

The problem is the same. The fact that we can hide most of the kernel memory is handy, but it shouldn't be up to the software to do that - user processes shouldn't be able to access memory that pagefaults. Definitely a dropped ball by the manufacturer

gpderetta · on April 24, 2019

Not the same. As you say, a process should not be able to access pages that are protected by hardware permission bits. That's exactly the issue with meltdown.

Spectre is different, it allows reading pages that a process would already have hardware permission to read[1] but actual permission is enforced at the software level (i.e. software as opposed to hardware bound checking).

[1] It is possible to construct spectre v1 attacks against other processes in some cases, but are much harder, low bandwidth and I do not think are yet shown to be practical.

_pd19 · on April 24, 2019

> Not the same. As you say, a process should not be able to access pages that are protected by hardware permission bits. That's exactly the issue with meltdown.

Okay, but the hardware issue is pretty much exactly the same... the bug in hardware that enabled spectre is also what enabled meltdown.

The only reason we managed to fix this is by moving most of the kernel task memory out of the user page table - a software fix for something that the hardware should have been doing.

gpderetta · on April 24, 2019

No, it is really a different bug. One is speculation around branches, the other one is speculation around hardware protection check (which is not implemented internally as a branch). In fact, while all OoO CPUs are vulnerable to spectre, not all of them are vulnerable to meltdown because, for whatever reason, their designers decided not to speculate around hardware checks (possibly because the check could be implemented efficiently in the critical path given their L1 cache design).

I'm not an hardware designer, but likely both instances of speculation use the same snapshot and rollback logic, but that's about it.

_pd19 · on April 23, 2019

I think that's a matter of debate - I personally don't think hardware should allow any sort of memory leakage in an otherwise safe language. It also renders any attempt to move critical software out of the kernel and into user-space very dangerous.

With meltdown, the issue was definitely a failure of the hardware vendors obligation -> Kernel memory should definitely not be exposed to the process and yet it was.

I would say such a justification is somewhat of a cop-out by the hardware vendors.

Ragib_Zaman · on April 23, 2019

People sometimes talk about waiting for a Spectre "fix" at a hardware or software level, but it seems that the only fix is to lose Speculative execution entirely (and with it, its performance gains). On my Haswell CPU, the performance hit was about 5% in various benchmarks, although newer architectures suffer less. Given that Spectre attacks still appear to be very difficult to perform in practice, I wonder if we will start to see Speculative execution at the hardware level brought back as an option for high performance and closed environment settings.

0815test · on April 23, 2019

There are architecture-level alternatives to Speculative execution for high-performance compute - namely VLIW and the "tweaks" on it that efforts like The Mill are working on right now. The main problem with them is that they make the processor ISA inherently dependent on the actual details of its hardware configuration (you don't have a single ISA like with RISC, which can seamlessly scale from tiny micro-controller hardware to supercomputer- or datacenter-scale). But once you're willing to bite that bullet, and perhaps the vast majority of the software you use gets distributed in either source-code or arch-independent "VM" format, it can become a very attractive choice. Yes we used to think that compilers would never be good enough to make use of the implied parallelism in VLIW, but it might be time to revisit that now that both languages and compilation platforms have improved.

gpderetta · on April 23, 2019

The only existing VLIW that is able to run general purpose code in a reasonably efficient way is Nvidia Denver. Which is also vulnerable to spectre

Edit: that I know, at least.

microcolonel · on April 23, 2019

I don't think you are actually disabling speculative execution on your Haswell.

Ragib_Zaman · on April 23, 2019

Perhaps not at the hardware level, I'm not sure. I downloaded InSpectre (a windows application) which informs me that I had been protected against Spectre and Meltdown attacks through Windows updates. I also used the option to disable the protection, which is how I figured out how much of a performance hit I was taking.

jcranmer · on April 23, 2019

The hardware mitigation at this point is disabling some of the advanced indirect branch prediction in some circumstances (I don't remember the exact set of circumstances). It is nowhere close to disabling all speculative execution.

gpderetta · on April 23, 2019

That's probably meltdown and spectre V2 (i.e. cross process poisoning of indirect branch predictor). There is no hardware/firmware fix yet (and possibly ever) for spectre v1, only software mitigations.

gpderetta · on April 23, 2019

The hit from disabling speculative execution is not a few percentage points, but a couple of orders of magnitude.

lallysingh · on April 23, 2019

I think you can do it by tracking caches pulled in during speculation and dumping them if speculation failed. Perhaps on a new barrier instruction that's lighter weight than lfence (as you're just enabling a recording mode for cache snooping).

Ultimately I don't think many speculative instructions actually pull in cache lines, so we can either clean up when they do, or stop speculating when a line would need to be loaded.

titzer · on April 23, 2019

The right mental model is that, in modern out-of-order CPUs, there is no difference between executing "in speculation" and "not in speculation". There is just a huge pile of u-ops (~200 on big Intel chips) in the reorder buffer, and some of them may be unexecuted branches. The fetch unit plows right on through branch instructions using branch prediction and jams code "from the future" into the reorder buffer.

The CPU keeps on executing whatever instructions are ready (i.e. have data dependencies met), irrespective of the in-flight branches. When a mispredicted branch is retired (~0.1-1% of the time), the CPU can throw away the entire reorder buffer and start over. A CPU can also abort-on-execute, throwing away only work related to the given mispredicted branch. So probably 99% of all cycles are spent with at least one unexecuted branch somewhere in the reorder buffer--the CPU is essentially always speculating.

tntn · on April 23, 2019

I don't see how this would really work. As soon as one core pulls in a cache line, all the other cores know about it, so somehow you would have to roll back all the coherency messaging as well.

gpderetta · on April 23, 2019

You can in a many-world interpretation of quantum mechanics. On a branch speculation, split the universe. Follow one branch in one universe, the other in another universe. If the CPU realizes that the speculation is wrong, destroy the universe (this is left as an exercise for the reader). As a bonus the CPU will have 100% speculation success. Unfortunately I think quantum mechanics itself might allow for side channels between universes.

ghusbands · on April 23, 2019

You also need to somehow restore the cache lines that got removed, else there's still information disclosure.

lallysingh · on April 24, 2019

Fine point. So really, speculative instructions can't pull in cache lines.

nicoburns · on April 23, 2019

I wonder whether you could mitigate by speculatively executing all/both branches, or somehow making it constant time...

jnordwick · on April 23, 2019

The issue is more that you are causing a calculated cache line to be fetched and then the timing difference between the L1 cache and L3 or MM fetch shows what the calculated value was. The only way to make those appear at the same speed is to slow L1 down on a speculative fetch to MM speed.

It isn't like a simple branch with two possible values. The cache line could be one of 256 (if scanning a byte at a time).

jnordwick · on April 23, 2019

I see you are getting down too for a very congenial, decent post. While HN does have an issue with differing opinions, for some reason this (and a few others) are socially censored vigorously.

Shrug. Not sure why.

msbarnett · on April 23, 2019

Because it's dead wrong, laughably so, in the assertion that speculative execution is a ~5% performance hit (it would, in reality, result in a couple of orders of magnitude slowdown on desktop workloads).

As noted elsewhere OP completely misunderstood the fact that his CPU has received some of Intel's Spectre mitigation firmware patches and came to the silly conclusion that this meant his CPU was no longer doing speculative execution of any sort.

HN is still a (vaguely) computer-savvy forum. Being extremely wrong about the fundamentals of how a modern CPU works is going to get you downvoted.

sciurus · on April 23, 2019

"the only fix is to lose Speculative execution entirely"

As mentioned elsewhere, this is incorrect and does not describe what CPU designers have done. That probably explains the downvotes.