Hacker News new | past | comments | ask | show | jobs | submit login
Analyzing Core I9-9900K performance with Spectre and Meltdown mitigations (anandtech.com)
206 points by pplonski86 on Jan 2, 2019 | hide | past | favorite | 79 comments



Conclusion at the end is fairly brutal:

“The long and short of matters then is that based on the testing we've done thus far, it doesn't look like Coffee Lake Refresh recovers any of the performance the original Coffee Lake loses from the Meltdown and Spectre fixes. Coffee Lake was always less impacted than older architectures, but whatever performance hit it took remains in the Refresh CPU design.”


Forgive me as I'm nowhere near knowledgeable in CPUs or such so my terminology will be way off.

For any CPU designed with the expectation of using the old method of memory access prediction without any protections... can we expect they'll ever show a significant performance recovery?

I guess I always assumed the answer was no.


(Someone please correct me if I'm wrong) Without adding additional hardware, likely not significant.

The way you avoid some of the impacted scenarios (at modest performance impact) is with additional hardware or microarchitecture changes.

Basically, the task is 'Ensure processor state, as observed by another process, never changes because of speculative execution branches.'

Which is a high bar to meet, especially if you want to simultaneously optimize your execution unit utilization.


That is pretty obvious if you consider this is just another tweak of the venerable Skylake architecture. By now we have said good bye to the reasonable thermals we enjoyed for a while because, well, if you deliver six of the same cores that'll consume 50% more at the same clock so one hand you slow down the chip when all cores run and on another, you just allow it to consume more.


And Skylake itself wasn't a new design either, but rather a tweak of Haswell (prominent changes were in the uncore), which in turn was largely identical to SB/IB.


It's been benchmarked that from Sandy Bridge to Kaby Lake the IPC only grew 20% https://www.hardocp.com/article/2017/01/13/kaby_lake_7700k_v... but the power efficiency has brutally increased: This is a 35W CPU in 2011 https://browser.geekbench.com/processors/381 and this is an Y (4.5-7W) CPU from 2017 https://browser.geekbench.com/processors/1822 .


Skylake was a reasonably big core architecture upgrade (a "tock" in Intel terminology). Several new instructions (XSAVE, AVX512, etc.) were introduced. Perhaps you meant Broadwell?


Consumer Skylake and derivatives didn't get AVX512; that's reserved for the server parts that arrived two years later.


How did they confirm that the mitigations weren't still being used? If you're still using separate page tables for the kernel, well of course it's going to remain slow. The point of the fixed silicon is so you don't need the mitigation, not that the mitigation gets faster.


There's a screenshot linked to in the comments [1] showing how you can query whether the mitigations are enabled, and showing the results for a 9900k.

1: https://www.anandtech.com/comments/13659/analyzing-core-i9-9... - screenshot comes from anandtech.com


The "fixed silicon" at least so far mostly seems to be hardware implementations of the existing software mitigation techniques with very similar performance characteristics.


Can we test this? Can we install an unpatched OS and observe the same performance penalty?

In the case of meltdown, it seems unlikely the CPU is maintaining its own shadow page table. How would it do that?


You could compile a Linux kernel with the Spectre defenses disabled. I assume userspace software that includes mitgations could also be patched out in the same way for testing


I agree, I remember reading this article and thinking something was odd...how could you implement page table isolation in hardware? The Spectre improvement could plausibly be more similar to software mitigation...


And that's a shame, since in principle, the hardware can do better than software. I've heard of approaches ranging from better cache partitioning to transactional commit to cache on instruction retirement. I can only imagine that Intel is working on these systemic fixes in the next big microarchitecture revision while continuing to apply cruder hacks on older cores that can't, say, alter low-level instruction retirement much.


Performance is all nice and such. Did anyone validate that the new processors actually mitigate Spectre and Meltdown?


They mitigate Spectre specifically but not speculative execution bugs in general, which it seeems will be with us for the foreseeable future.


at least we'll regain some lost spectre-related perf hits once MS ships their retpoline-patched kernel

https://mspoweruser.com/windows-10-19h1-will-reduce-the-impa...


Don't most machines with this kind of CPU in them not run Windows?


If you’re talking about the i9-9900, it’s a very high clock frequency part with “only” 8 cores and part of the consumer line (no ecc support). I’d actually think most people who have it run Windows and use it for things like gaming.


Most maybe, but there’s still quite a large fleet of windows servers out there.


i've seen js performance on synthetic benchmarks drop somewhat over the past months. perhaps JS JITs like V8 rely to some degree on specific branch prediction properties.


I wonder how much Intel knew of this "bug" and went ahead and shipped with it because of the speed increases.


Knowledge about this type of attack dates back to the 90s (possibly earlier), but it isn't entirely clear whether the engineers who developed the "protection check after [speculative] load" were aware of that. I would argue though that "check after load" should have smelled bad to people intimiately familiar with CPU design.

It should be noted though that at the time neither sharing processors with strangers / across trust boundaries nor executing arbitrary crap in a VM were common activities. Memory protection and such were mostly viewed as a technique to increase reliability, not to provide actual security.


I want to say I'm not giving them a pass but I think you can go overboard and "know" that "hey in theory someone could" and security yourself into never doing anything.

Accordingly how much they knew about the nature of it and their ability to predict is really the question and IMO kinda a hard one to know (unless there are some memos out there) / judge.

Granted in the age of little to no consideration given to security in so many places .... I wouldn't be surprised by anything.


Working closely with Intel and others on these issues I have seen zero evidence that anyone realized the security implications and shipped anyways. Zero evidence.


This bug was designed around 1993 when security, multicore, and SMT didn't really exist.


I suppose the question or insinuation would be whether it was discovered by Intel (or someone else) in the meantime?


The CIA and NSA called, they said "Yeah".


They should make a "miss me yet?" meme for Itanium.


What does this mean for developers buying laptops or workstations in the next year - is amd or raptor looking like a better choice, or is Intel still looking good even after the hit?

I'm reading that workstations might not need to worry for the most part for example, unless a package gets compromised or some browser exploit makes it through


I built a workstation with a Threadripper earlier this year, and I couldn't be happier so far. The single-core performance advantage of Intel parts isn't that big, and you get a ton of cores and PCIe lanes in exchange.


How would you write a program to use these exploit? Unless I'm mistaken, I'm under the impression you have to talk directly to the processor through the kernel in order to do any of these exploit. You would have to write the code in assembler or use a special library to do predictive branching?


Your code always talks directly to the cpu. Your program 'just' runs in a env where it is not allowed to do a few things the kernel is allowed to.

Probing the cpu for timings of memory access you don't have access to, or forcing it to something somewhere where you do have access to, you don't need the kernel for that. Thats the problem.


Yes, you'd write some assembly or C. Or your start from existing demos: https://samsclass.info/123/proj14/spectre.htm


You don't need to write assembly or C; it is possible to perform an exploit by utilizing any high-resolution clock, like the one JavaScript on most browsers provided until recently.


I would love to see a benchmark for VM performance before and after.


TLDR: hardware patch is as slow as the software one.

Except now it's a bit worse since I cannot disable the patch to recuperate lost performance.


> I cannot disable the patch to recuperate lost performance.

Knowing the actual, demonstrated risks... why would you do this?

I'm not trying to devalue your position. I'm trying to understand your risk calculation.

---

Edit: Good catch, humans. The thought of running code in an unexposed, isolated, largely trusted environment didn't cross my mind; I was moreso focused on the environments I'm used to (where everything is connected and nothing is trusted). That said, I'd argue that a database backend to any typical webapp definitely qualifies as exposed.


I'm going with the herd immunity for my personal stuff.

The calculation is basically:

their_waste = likelihood enough people have the mitigations enabled (not tech enough to disable them) so that "bad people" will not waste time developing exploits for the tiny number of unprotected people like me (herd protecting me)

my_risk = likelihood of "bad people" actually finding me and being able to run their code

their_reward = likelihood of them actually finding something meaningful and valuable in the memory they can manage to dump

oops = (my_risk * their_reward) / their_waste

I am assuming my_risk and their_reward to be low and their_waste to be high, so oops will be acceptably low (hopefully :p)

Wish me luck!


Server logs are full of scripts and people trying to penetrate services with old, known bugs that have been widely neutralized for years.... it doesn't cost much to try one exploit.


So far there haven't any mass exploits for these CPU bugs targeting personal systems like mine in the wild.

The most risk comes from web browsers as you can execute (constrained) attack code in those. That's why browser vendors were busy disabling SharedArrayBuffer real quick and developed further mitigations that hinder exploiting CPU bugs like these.

I may reconsider when browser-based exploits become a real thing that is in widespread use.

People probing my ports and exposed services running on my machines is far less of an issue since I don't run any service designed to run attacker supplied (but sandboxed) code like a browser does. If somebody managed to run code anyway (RCE) then I probably would have other problems than just worrying about somebody running spectre exploit code ;)

As it stands right now, the prime targets are shared execution environments running untrusted sandboxed code, aka cloud providers needing to worry that customer A's VM doesn't dump the memory of customer B's VM running on the same hardware.


Not everyone is running any untrusted code. If you're running (for example) a physics simulation, the mitigation doesn't gain you much.


Fortunately the performance of these kinds of CPU-bound workloads are almost completely unaffected by the mitigations, so you might as well enable them anyway.


One of them is "Disable hyperthreading," which absolutely has a severe penalty.


For some workloads it is better to have hyperthreading disabled.


For some workloads.


Actually, I've seend benchmarks to the contrary. As usual with benchmarks, there's no useful data to understand them -- specifically not performance counter data. I've yet to see a good analysis and haven't been able to do it myself. Many large-scale computations actually aren't CPU-bound, except insofar as they spin in MPI; some do plenty of filesystem i/o, but at least with something like PVFS2, that can be just in user-space on the compute node.


The sort of HPC clusters with which I'm familiar run plenty of what I'd call untrusted code, and are multi-access with arbitrary student users and not-infrequently-compromised credentials. That said, there seems to be a fairly small attack surface the way I'd set up compute nodes, even if they're not single-job/node; especially if maximum job times are a day or two. I probably wouldn't turn on the mitigations on compute nodes.


There are non-internet facing workloads that would benefit from the additional performance.


I completely agree when it comes to running, say, a VPS.

If you're running (or using) a service where thousands of businesses rely on the ability to run their code and their data on your machines without any of your other customers being able to access it, yeah, security is priority #1.

On my personal workstation, though, what are they going to get? My credit card number? That's my bank's problem. I'm not particularly worried about targetted attacks, if my competitor or customers got everything on my hard drive little would change for any of us. Force me to restore from backup? Email password would be bad, but that's partially what 2-factor is for.

I have a tiny chance of getting a few hours of inconvenience if someone completely owns my PC. That's not worth all my work happening a little bit slower all the time.


> On my personal workstation, though, what are they going to get?

I think it depends on the context. I felt similarly until I discovered just how many machines attackers would pivot through in real-world attacks featuring strong adversaries. Preventing these attacks on every machine is a strength-in-depth measure.


> what are they going to get?

Your data is likely less important in that context compared to your device as a fractional resource or a pivot. (which I believe is largely zerkten's point)


Consider a server application that doesn't run arbitrary untrusted code and doesn't have meaningfully separate privilege levels.

You can't leverage any speculation exploit without code execution, and there's nothing left to exploit on the box once you have a shell.


The database backend does NOT qualify... If your DB machine is exposed to the outside world, using this exploit is overkill and there are plenty of other easier vectors of attack.


Simple. I'm pretty sure I won't be the first to fall victim to an exploit that is purely academic at this point. Why on earth should I take a large performance penalty on my own PC to mitigate an attack that I'm pretty sure (a) will never be a problem for me and (b) that will almost certainly arrive with plenty of warning if it does?

This whole business is massively, massively overhyped from the point of view of individual workstation users. Not every system needs to be locked down like NORAD. Doing so is a failure of basic threat modeling.


If it is on my personal machine I can't accept a 8-15% performance hit (I don't know the real impact thats just the first range I found).


My database layer doesn’t run any untrusted code. I’d like the perf back there. Then audit the shit out of any users running there.


I'm talking out of my behind here, but I was always under the impression some stuff we use casually is built with the assumption it's not under heavy attack, too.

Eg: I thought consumer grade videocards were pretty darn insecure. I don't know where I picked that idea, but if that's true, then the idea of having the option to run "insecurely" for certain things make sense.

Not sure if we can trust an average user with this, but if the videocard thing is true, we do.


I thought these exploits already required code execution on your machine. (Maybe I'm wrong about that.) If untrusted code is already running on your machine, your system is already compromised. So I don't see the big deal about these exploits, except in the context of hosted VMs.


It's not that they don't require code execution, it's that there is more code execution happening in things that are supposed to be sandboxed than most people generally anticipate. It's not just VMs.

For example, how many of the map editors for various games are Turing-complete? If you download a custom map from random peer, you may be executing "sandboxed" code. Can it pull off a timing attack?

And the elephant is presumably javascript.


You'd still need a communication channel to the outside world that is available to the attack code/map or else it cannot exfiltrate the data it dumped.


In a multiplayer game where each of the peers is constantly sending the others data, that seems like a surmountable problem.


The map engine "executing" the map has no access to the network layer of the game; or at least it shouldn't.


what games don't have maps with manipulable objects that would need to have their state synced over the network? A barrel existing/having been exploded is one bit, the precise position of an object is quite a few more, etc.


> In this paper, we present NetSpectre, a generic remote Spectre variant 1 attack. For this purpose, we demonstrate the first access- driven remote Evict+Reload cache attack over network, leaking 15 bits per hour.

https://misc0110.net/web/files/netspectre.pdf


leaking 15 bits per hour.

Exactly. 15 bits per hour, in an artificial environment with minimised noise, after untold amounts of preparatory work were already performed to analyse the software running on the target machine.

Here, have 15 bytes from a random process running on my machine (I just randomly attached a debugger, scrolled through memory arbitrarily, and copied them):

d1 e1 81 f9 fe ff 00 00 76 05 b9 fe ff 00 00 66 89

What are they? I don't know. Maybe you're really lucky and it's a key to something, or a password hash... but what? The above would've taken 8 hours to read using that attack. Now you should see the level of unconcern I have about this. Someone who is being targeted would care more, but I don't believe me and indeed the majority of users are so important as to be in such a position.

In much the same way I'm not going to install bars over every window of my house.


15 bits per hour means that in 136 hours you potentially exfiltrate a 2048 private key. Actually make it 120 hours since the rest at that point is bruteforceable.

I imagine Gmail's HTTPS certificate, or a Microsoft code signing key, or Linus' GPG key, or being able to impersonate some government agency or messaging server, are well worth 5 days of this.


Yes, of course as I referred to this is something only high-value targets need to worry about, and even then I think it's not that high up on the list of risks. 5 days is just to read the data, and there's a considerable amount of preparatory work involved in setting up this attack --- figuring out what to read and where it is, is just as hard if not more so than figuring out how to read it through Spectre.

The authors of that paper have the massive advantage of knowing exactly the software running on the target system and its environment; something which an attacker in the real world is unlikely to have, unless the attacker already has such familiarity with the system that it seems far easier to exfiltrate data via some other means than trying to find and setup this very slow side-channel. Everything has to be set up just right for this to work. Otherwise you might probably still manage to read something, but it's completely useless.

(High-value private keys in companies are likely to be in HSMs anyway, in which case they're completely inaccessible to attacks like these.)


Not really. How do you know where that 2048 bit private key is? You might have to read through the entire address space. On a relatively small server with 16 gigabytes of RAM, it'll only take you about a million years to exfiltrate the entire thing...

Let's say you luck out and only need to read the first 100 megabytes of memory... you're talking 1000's of years.


> d1 e1 81 f9 fe ff 00 00 76 05 b9 fe ff 00 00 66 89

That's amazing, I have the same combination on my luggage!


Are you a missileer who is confusing your luggage code with your ICBM missile code?


What a horrible test. They tested without hyperthreading turned on. Spectre and Meltdown are risks BECAUSE of hyperthreading. It makes zero sense to test the performance impact of the fixes with the major component of the problem turned off.


Neither Spectre nor Meltdown are related to hyperthreading.


I will look it up later but i thought hyperthreading increases it.

Perhaps i'm mixing something up but i thought that intel removed smt from the newer generations (is removing it) because of it.


There have been other recent vulnerabilities, like the ax/ah thing described in [1] or TLBleed that have relied on SMT, but not Meltdown or the original Spectre variants.

[1] http://gallium.inria.fr/blog/intel-skylake-bug/


You're right that Spectre and Meltdown are not related to SMT, but that doesn't invalidate the wider point about hyper-threading and side-channel issues I think.

Parent may have simply meant TLBleed/L1TF (Foreshadow) instead of Meltdown/Spectre.


I'm not trying to invalidate any wider point about hyperthreading, and the original article wasn't trying to make one.

It was specifically about Spectre and Meltdown mitigations which are unrelated to hyperthreading, so testing with or without hyperthreading is fine. Bringing up hyperthreading here is like bringing up how a diet high in salt is unhealthy, someone pointing out that salt has nothing to do with the original article, and then a final comment "Yeah, but that doesn't invalidate the wider point that we should consume less salt!".


Running the chips in less than ideal performance settings does seem fishy. I wonder if the perf gap is wider or narrower with HT on.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: