Intel Confronts Potential ‘PR Nightmare’ With Reported Chip Flaw

dsign · on Jan 3, 2018

This is a clusterf/big deal. Beyond the security implications, it means that all companies paying for computing resources will have to pay roughly 30% more overnight on cloud expenses for the same amount of CPU, assuming that they can just scale up their infrastructure.

I know that bugs happen and that there was nothing intentional on this one, but at times like this is hard to held at bay the temptation of claiming for a class lawsuit against Intel....

rbanffy · on Jan 3, 2018

It's a good thing CPU is fairly compressible. Unless you meter it very carefully, you'll see the performance hit and it'll not impact you that much. Very few of my physical boxes are over 70% CPU utilization on a daily average.

It's, however, really bad if you sell CPU cycles for a living. You just lost between 5 and 30% of your capacity. If you have a large building, you just lost part of your parking lot to the Intel Kernel Page Problem building.

devcpp · on Jan 3, 2018

Problem is, most companies that need a lot of power only care about one thing - peak performance. And they tune it carefully in order to not overspend while guaranteeing minimal downtime. This means that they'll have to pretty much scale their infrastructure up by exactly 30%. That's a LOT for these big clients.

Honestly, I'd just make sure the server firewalls are super tight and not take in the future patches. At least for now.

ajross · on Jan 3, 2018

Very very few highly tuned "peak performance" workloads are dominated by syscall overhead like the test that produced that 30% number was. It's best to hold off on the hyperbole.

dijit · on Jan 3, 2018

I have a power dependent workload that scales horizontally and is currently already dominated by the cost of system calls. This will effectively, directly cause me to buy 30% more compute on a huge infrastructure. (2,000~physical machines. Quite beefy dual socket machines with a lot of memory)

I know I’m not alone.

Then again. Think of microservices, Kubernetes for instance; Network requests are system calls.

rbanffy · on Jan 3, 2018

If your workload has no code that's untrusted, you can safely skip this patch or disable it on boot. If not, at 2000+ physical machines, it may be worth to move some of that into kernel modules that would collapse a couple syscalls into a single higher level one.

ufo · on Jan 4, 2018

The VM host will still have the patch applied, won't it?

rbanffy · on Jan 4, 2018

Yes, but if it's your metal, you don't need to.

dijit · on Jan 4, 2018

Good idea. But it’s a windows executable and depends quite heavily on windows specifics.

(In my case anyway)

30% overhead might be inscentive to revisit the assumption we can’t rewrite it for Linux.

rbanffy · on Jan 4, 2018

You still can move some functionality to a device driver or something else that runs in the NT kernel space.

rogerthis · on Jan 4, 2018

But then you have to release the module as GPL, no?

lmm · on Jan 4, 2018

Only if you want to do something that would otherwise be a violation of copyright - e.g. distribute the module to other people (assuming it's sufficiently entangled with linux to be a derivative work therof). The GPL only licenses you to do things that you otherwise couldn't do, it doesn't restrict you from doing things that were never a violation of copyright (e.g. privately modifying your own things).

KingMachiavelli · on Jan 4, 2018

No kernel modules can be closed source; GPU drivers are a common example of a closed source kernel module.

maxander · on Jan 3, 2018

Will you be buying Intel-based machines? Or will you be running a hybrid-architecture cluster now?

I don’t know very much about computing on that scale, but I wonder if all the people selling off Intel stock are thinking this story through.

lev99 · on Jan 3, 2018

AMD server CPUs currently outperform Intel on some multi-threaded benchmarks. This usually isn't a problem for people buying for peak-performance because you can always buy more CPUs to increase parallel programming speeds, but it's harder to make single threads faster.

It's possible that the patches applied to fix this bug will cause some single-threaded benchmarks to change from Intel being the fastest to AMD being the fastest.

myrandomcomment · on Jan 4, 2018

So for what it is worth my company has all Intel kit. We run servers that run docker. In each docker container we do build / test for our product. That is all we use them for. 1RU with 2 blades, each blade is dual socket, 72 total cores, 512GB RAM. We will not apply this patch as none of this is public facing and we do not want the hit to build / test throughput. The one big thing that this has done is we were looking at AMD for new servers and that has now become a higher priority on the to do list. Given our environment we care about the number of containers we can run, period.

dijit · on Jan 4, 2018

It is overwhelmingly likely that we’ll buy more intel. Power/Watt has always been superior and AMD has to prove itself over time before we’d buy it.

Not trying to kill expectations. This decision isn’t mine alone. You know the old saying “nobody got fired for buying Cisco” that applies to Intel too.

bpchaps · on Jan 3, 2018

No, but lots of workloads are built with latency in mind. For APIs that talk to each other in long serial chains, don't be surprised if request responses take significantly longer in many, many workflows.

brazzledazzle · on Jan 3, 2018

I wouldn’t isolate your concern to firewalls and bad actors that break in over SSH. If they manage to find a vulnerability in your app that allows remote code execution this could help them make that problem much worse. Also VM/container escapes are a big problem if you use a cloud provider.

redcalx · on Jan 3, 2018

> I'd just make sure the server firewalls are super tight and not take in the future patches. At least for now.

Good security is about layers. No one layer can be assumed to be watertight, but with enough layers you hopefully get to a good place.

viraptor · on Jan 3, 2018

If they really care about peak performance, I don't believe the PTI patch will affect them. If you can change your system in a way that the power-hungry part does not work on untrusted data, you can not with "nopti" and ignore it. Systems which both need lots of maxed-out CPUs and traffic directly from wild internet are pretty rare. They're unlikely to run on a virtualised systems either.

btilly · on Jan 3, 2018

Systems which both need lots of maxed-out CPUs and traffic directly from wild internet are pretty rare.

That's a good description of basically every cloud environment out there, from AWS on down.

In other words they are extremely common.

rbanffy · on Jan 3, 2018

There are many ways to tune such workloads and I suspect our software will get better as a result.

We'll start to get conscious about the number of syscalls we use on each operation, start using large buffers, start buffering stuff user-side...

viraptor · on Jan 3, 2018

The CPUs in cloud environments are not maxed out in general. There will be some area like batch processing and compute-specific VMs. For other cases, there's quite a bit of overcommitting of resources. And that's before you start doing scheduling that mixed workloads on a physical host for better utilisation. Source: worked on a public cloud environment.

btilly · on Jan 3, 2018

I agree with you on most VMs. But once you schedule mixed workloads, you want each host to be balanced so that all of its capacity is utilized evenly. Which means that if CPU use increases across the fleet, you will want new hardware with more CPU.

rbanffy · on Jan 4, 2018

Either that, or you'll have to put up with some processes taking longer.

steve_gh · on Jan 4, 2018

About 10 years ago I was mentored by a guy who was an utter wizard at queuing theory, and who bugfixed a whole bunch of nasty issues in cellular telecoms hardware through his understanding of how queuing theory impacted code execution.

TL:DR - queue behaviour gets nonlinear as you approach the theoretical max load. If you are running your processors at a high load, even a small change in code throughput makes a huge difference to real world behaviour.

azeirah · on Jan 4, 2018

I believe I saw a strangeloop talk about this specific issue in Clojure. The talk giver was talking about channels, not queues though.

gleenn · on Jan 7, 2018

Do you have a link?

topspin · on Jan 3, 2018

Hmm. Since everyone that sells (Intel) CPU cycles for a living suffers the same loss of supply this boils down to pricing; the same demand chasing fewer cycles will drive up prices and the market will adapt.

30% is a big hit. I'm wondering if that isn't a bit exaggerated, or perhaps the consequence of a poorly optimized workarounds that will rapidly improve. I recall seeing figures on the order of 3% only a few days ago.

btilly · on Jan 3, 2018

30% is a worst case for a workload optimized to hit the performance bug as hard as possible.

How big it will be for your workload is a function of what your workload is. Benchmark if it is important to you.

earenndil · on Jan 4, 2018

50%, rather, is the worst case scenario. 30% is a bad case scenario, and 5% best-case scenario. Which is still a lot for large cloud providers like amazon, google, microsoft.

dx034 · on Jan 4, 2018

Any public information on how someone like Google or Facebook handle this? Do they have enough spare capacity to patch or will they need to build further capacity first? I could imagine 10% of Google's capacity (internal services, not Google Cloud) is at least a large datacentre.

earenndil · on Jan 6, 2018

I know someone who works for amazon and he said that they didn't need to do anything or buy any more servers.

ktta · on Jan 3, 2018

>It's, however, really bad if you sell CPU cycles for a living.

Who really sells CPU cycles? Cloud providers sell instances priced per core. So the real hit is by the customers since they have to shell out for more instances for the same amount of computing power.

The hit I see is by providers of 'serverless' computing, since they charge per request and have their margins reduced.

zedpm · on Jan 3, 2018

> The hit I see is by providers of 'serverless' computing, since they charge per request and have their margins reduced.

AWS, Azure, and GCP all bill serverless with a combination of per-request fees and compute (GB-seconds), so I'd expect the entire hit to be passed on to the user since this will cause increased compute time for each request. N requests that used to average 300ms each will now be N requests that average, say, 400ms, so the per-request billing remains the same and the compute billing will increase by approximately 30%.

ktta · on Jan 3, 2018

I don't understand what exactly you're saying. All of those services have serverless services, but they also have server based instances which abstract compute to amount of cores and RAM rather than CPU cycles. And most use is out of the services which aren't serverless.

TotallyHuman · on Jan 4, 2018

Then it's because you don't know how modern cloud works.

ktta · on Jan 4, 2018

I now see that my misunderstanding was about who exactly the users were and who the provider was. My opinion of provider was only GCE, AWS, etc. while the commenter I believe when talking about providers included users of those services (who again were providers of serverless services).

user5994461 · on Jan 4, 2018

A lot of companies do, including many ycombinator startups. Think of anything that's analytics, data science, data warehouse or advertising related. The costs to run their service just took a hit.

Houshalter · on Jan 3, 2018

Their competitors are also affected.

Also a 30% decrease is also equivalent to setting Moore's law back 7 months. A 5% loss is only setting it back 1 month. I know that's a bit of a naive calculation. But the point is computing power has long operated in an exponential domain. So big differences in absolute numbers aren't necessarily a big deal.

seccess · on Jan 3, 2018

According to this patch comment, AMD x86 chips are not affected: https://lkml.org/lkml/2017/12/27/2

mulmen · on Jan 3, 2018

Sure but who is using AMD chips in place of Intel server chips? If company A competes in the widget market against company B and they both built their server infrastructure on Intel then neither company gained an advantage due to a performance degradation in Intel hardware.

geezerjay · on Jan 5, 2018

> Sure but who is using AMD chips in place of Intel server chips?

Well... Everyone who bought AMD. Some people managed to see beyond the hype and go for the optoon that made sense.

mulmen · on Jan 5, 2018

The overwhelming majority of the cloud runs on Intel. Saying AMD is slightly better off doesn't really help if my systems are built on Intel. This is the case for most people.

What hype are you referring to? Are you suggesting the people who bought AMD knew this was a problem for Intel?

pi-rat · on Jan 3, 2018

Azure got some AMD EPYC.

Mistletoe · on Jan 3, 2018

>Sure but who is using AMD chips in place of Intel server chips?

Maybe a lot more now?

jjwiseman · on Jan 3, 2018

CPU speed hasn't followed Moore's law since 2003ish. (Number of transistors is still following Moore's law, but that doesn't necessarily directly help you when your program is suddenly 3-30% slower.)

Houshalter · on Jan 3, 2018

A CPU from 2017 is going to run your programs a hell of a lot faster than one from 2003. Even if they technically have the same clock speed. Look at benchmarks for instance: https://www.cpubenchmark.net/high_end_cpus.html

RobAtticus · on Jan 3, 2018

The claim wasn't "CPUs in 2017 are not faster than CPUs in 2003" or even "CPUs in 2017 are not much faster than CPUs in 2003"; the claim was that they haven't followed Moore's law since 2003, so applying it to CPU speed nowadays is inaccurate. Of course CPUs are faster now than they were 14 years ago, just not as fast as the case where Moore's law still applied to CPU speed.

lttlrck · on Jan 3, 2018

Moore’s Law doesn’t describe CPU clock speed increases.

slededit · on Jan 4, 2018

Dennard scaling however did deal with clock speed (indirectly via power). It has failed since about 2005.

bunderbunder · on Jan 3, 2018

I don't think you can reliably phrase this in terms of Moore's law. Moore's law mostly concerns raw FLOPs. It's less useful for predicting hardware performance for operations that are governed by limitations like I/O and memory latency. And this slowdown, if I understand it correctly, is largely driven by memory latency.

adrianratnapala · on Jan 4, 2018

One of the rationales for cloud computing is it saves money by cracking up utilisation. Providers observe how much users "really" use and then provisioning that much.

True, sometimes you will leave boxes at low utilisation for various reasons, e.g. to deal with traffic spikes. But those reasons have not gone away. So now instead of heaving a predictable increase in CPU cost, you have an unpredictable increase in performance snafus.

The only good news is that the real performance hit will be less than 30% on many workloads. Especially once the providers start juggling and optimising.

bogomipz · on Jan 3, 2018

>"It's a good thing CPU is fairly compressible."

What do you mean by "compressible"?

theoh · on Jan 3, 2018

Presumably, for a certain important class of application, CPU is not used "densely", i.e. continually. Instead it's used intermittently, like a gas rather than a solid... Hence compressibility. Such applications are far from being CPU-bound, in other words.

bogomipz · on Jan 3, 2018

So a cloud provider would be an example. Compressible similar to a sparse file I guess as well. Thanks this makes sense.

laythea · on Jan 3, 2018

I think it was meant that a normal application does not utilise the CPU all the time, which can be seen by looking at the task manager CPU usage % = X. Any extra processing needed to fix this bug will have to come out of the remaining 100-X%. This is OK as long as you have enough spare %, and can afford the extra power usage for that processing.

bogomipz · on Jan 3, 2018

That makes sense, thanks. This is a big deal.

rbanffy · on Jan 3, 2018

Virtualization is one popular way to drive up CPU utilization. The more diverse workloads run on a given server, the more even the CPU usage tends to get. This way, if you have 100 workloads that peak at 100% but average at 1%, your CPU usage will tend to be smooth at 100%, any overallocation will smooth out over time (a job that would take 1 second may take up to 10).

asplake · on Jan 3, 2018

No vendor can afford to do it at a loss for long. One way or another the customer will end up paying

exabrial · on Jan 3, 2018

There's also latency though :/ it seems that programs that make a lot of syscalls will be affected more than programs that are doing in-process calculations

rbanffy · on Jan 3, 2018

We'll start being more syscall conscious when we write our programs. We'll batch more at the user mode side and try to use less syscalls to do the job.

Kernel ABIs will eventually reflect that and crop up higher level expensive calls that replace groups of currently cheap syscalls (that will become expensive after the fix).

And Intel will profit handsomely from next generation CPUs that'll get an instant up-to-30% performance boost for fixing this bug.

pkaye · on Jan 3, 2018

What about all the kernel interrupts due to network and storage traffic?

exabrial · on Jan 3, 2018

Maybe the scheduler could dedicate a core to interrupts and software that has small quanta and page tables? Can't really think of a code solution that doesn't sound stupid when I type it.

0xTJ · on Jan 4, 2018

Another consideration is power usage in data centers. Server power usage is annoyingly complex, and once you get above 70% utilization power usage may go up considerably.

oliwarner · on Jan 3, 2018

As a billion other people have already said, that all depends on their workloads. This isn't a 30% clockspeed deduction.

prudhvis · on Jan 3, 2018

As i understand the problem, this isn't about clockspeed reduction, now it is the software's responsibility to check if the page is a kernel page/user page. So, the impact is significant. So, every time either pages are touched/accessed this check needs to be triggered, which causes it to be much slower.

DSMan195276 · on Jan 3, 2018

> So, every time either pages are touched/accessed this check needs to be triggered, which causes it to be much slower.

Not to be mean, but that's not what is being changed.

You're right on the bug - userlevel code can now read any memory regardless of privilege level. However the fix isn't to manually check the privileges on each access - that would be extremely slow and wouldn't actually fix the problem.

The fix is to unmap the kernel entirely when userspace code is running. Because the kernel will no longer be in the page-table, the userspace code can no longer read it. The side-effect of this is that the page-table now needs to be switched every-time you enter the kernel, which also flushes the TLB and means that there will be a lot more TLB misses when executing code, which slows things down a lot.

So, to be clear, it is not accessing pages that is being slowed down, it is the switch from the kernelspace to the userspace.

Cyph0n · on Jan 3, 2018

But doesn't the CPU enter kernelspace every time a syscall takes place? So based on what you've described, every time a syscall returns control back to userspace, the TLB will be flushed, which means slower page access times in general.

DSMan195276 · on Jan 3, 2018

The distinction I was trying to make was the above commenters thinking that the kernel is now checking page permissions instead of the CPU doing it - IE. Doing privileged checks in software. That's not what's happening, the kernel is just unmapping itself when usercode is run so the kernel can't be seen at all. Then the privileged checks (which are now broken) don't matter because there is no kernel memory to read.

All your points are right though. Page access times will in general be slower because of all the extra TLB flushes, leading to more TLB misses when accessing memory.

cortesoft · on Jan 3, 2018

Right, but how often that happens is workload dependent. Basically, how often is your code making syscalls.

Cyph0n · on Jan 3, 2018

But don't all FS accesses (e.g., write to socket, read from DB) require a syscall? In that case, basically all web applications would be affected.

Or am I completely off the mark?

cortesoft · on Jan 3, 2018

No, you are correct. Really, every application will be affected, they all make some syscalls. How much will vary, though.

lmm · on Jan 4, 2018

At least one syscall happens at some point, but performance-tuned systems already use "bulk" syscalls where a single syscall can send megabytes of data, check thousands of sockets, or map a whole file into your address space to access as if it were memory.

vidoc · on Jan 3, 2018

> how often is your code making syscalls.

And how often the kernel services interrupts.

moonbug22 · on Jan 3, 2018

You don't understand the problem.

cornholio · on Jan 3, 2018

> claiming for a class lawsuit against Intel

If people who received written assurance from Intel that their hardware is 100% bug free can form a legal class, sure. I highly doubt there is even a single one such customer.

golfer · on Jan 3, 2018

Anyone can sue anyone else at any time. If you think Intel isn't going to be sued for this, you're wrong.

cornholio · on Jan 3, 2018

It depends on how they handle user compensation. Going by the FDIV precedent, they should typically replace all defective products for free, and they will be in the clear.

What I meant was that the presence of the bug itself is not a valid cause, for example you can't claim that due to the error you lost 1 trillion dollars via a software hack - even if it's true. If Intel can prove they acted ethically when disclosing the bug and that they replaced / compensated users up to the value of the CPU, they are in the clear.

macintux · on Jan 3, 2018

I read that this bug goes back several years; "replacing all defective products for free" could be a massive expense, and assuming it includes current chips, there's also some lag time and engineering effort to get to the point where they could start doing so.

throwaway613834 · on Jan 3, 2018

How do they replace defective products? Or more specifically how do you get your laptop CPU replaced if Intel offers a replacement for free?

cornholio · on Jan 3, 2018

Presumably, by visiting an agreed service center in your western, sue-happy country, or sending the computer to the nearest one on your expense in the rest of the world. If the CPU is not replaceable or no longer in service, you would get a voucher for the lost value of the CPU/computer that is now 30% slower. Something like 10-20$ for anything older than 3 years, so most people won't bother. If Apple can do it, surely Intel will manage, but it will cost them in the billion order of magnitude, a non-negligible fraction of their yearly profit.

golfer · on Jan 3, 2018

Good call. Will be fascinating to see how this plays out!

ianai · on Jan 3, 2018

They need some legal standing or the case can be dismissed out of hand. It may very well be a question of who has the better legal team.

AnimalMuppet · on Jan 3, 2018

> They need some legal standing or the case can be dismissed out of hand.

Yes and no. Yes, Intel would get a chance to claim that the case should be dismissed out of hand. To do that, they have to prove that, even assuming all the claimed facts are true, the people suing still don't have a valid case. That's a high bar. It can be reached - there's a reason that preliminary summary judgment is a thing in court cases - but it takes a really flawed case to be dismissed in this way.

How flawed? SCO v. IBM was not completely dismissed on preliminary summary judgment, and that was the most flawed case I've ever seen.

> It may very well be a question of who has the better legal team.

Well, Intel can afford to hire the best. A huge class-action suit can sometimes attract the best to the other side as well, though. (There's not just one "best", so there's enough for both sides of the same court case.)

IANAL, but it looks to me like there's at least the potential for a valid court case. CPUs are (approximately) priced according to their ability to handle workloads; if they can't provide the advertised performance, they didn't deserve the price they sold for.

user5994461 · on Jan 4, 2018

What is bug free? CPUs work just fine. There is no bug.

The question is did anyone receive performance assurance from Intel? Probably not.

Some cloud providers or compute grids just lost a lot. Maybe they will find an angle to claim compensation.

ashelmire · on Jan 3, 2018

From what I've read, this slowdown only affects syscalls, which, since they aren't usually a huge percentage of processing in the first place, should not have such an effect. You're more likely looking at a few percent at most, which is not going to be enough to make AMD outperform Intel. Let's stop the fear mongering and wait for actual metrics.

vardump · on Jan 3, 2018

> From what I've read, this slowdown only affects syscalls

Incorrect. It also affects interrupts and (page) faults.

Any usermode to kernel and back transition.

api · on Jan 3, 2018

So this is evil for virtualization hosting, which is the major enterprise application for Intel chips.

Hosting on bare metal will become more attractive. Too bad you can't long OVH and Hetzner.

_jcwu · on Jan 3, 2018

>Too bad you can't long OVH and Hetzner.

What does that even mean?

Also Hetzner just introduced some AMD Epyc server.

golfer · on Jan 3, 2018

"Long" as a verb means to purchase their stock.

As opposed to "shorting" a stock, which means making a bet that it will go down in value.

_jcwu · on Jan 3, 2018

Ah that makes sense. Thanks!

rbjorklin · on Jan 3, 2018

For some reason I can’t reply to ‘chrisper’ but I think ‘api’ is referring to going long in the stock market.

https://www.investopedia.com/terms/l/long.asp

Splines · on Jan 3, 2018

> For some reason I can’t reply to ‘chrisper’

HN doesn't let you do this to new comments to avoid back-and-forth commenting that is typical in flamewars.

sli · on Jan 3, 2018

Coming up on 9 years here and I'm still finding out new things about how this site works. I've been wondering recently why some comments aren't replyable.

jfoutz · on Jan 3, 2018

I think there is a time delay. Wait a few minutes or hours and you can reply. That cooling off period has helped me really think through my replies.

AnimalMuppet · on Jan 3, 2018

They become replyable after some amount of time. The amount of time varies based on how active the thread is and/or how deeply nested the comment is.

You can reply anyway, but you have to click on the timestamp ("X minutes ago") to do it.

pbhjpbhj · on Jan 3, 2018

Usually you can click on the <posting time> and go to a page that displays only that comment, which has a reply box even when there's no reply option on the main page.

bandrami · on Jan 3, 2018

That 100Hz timer tunable just got a lot more attractive...

dx034 · on Jan 4, 2018

Does that mean that you can get an instance on AWS and slowdown the underlying server for all others by forcing a lot of syscalls? Or how is performance distributed between tenants?

DocSavage · on Jan 3, 2018

Prelim benchmarks show a significant impact (~20%) on Postgresql benchmarks.

paulmd · on Jan 3, 2018

20% when running SELECT 1; over a loopback network interface, not in real-world workloads.

The other benchmark that has generated some consternation is running 'du' on a nonstop loop.

Both of these situations are pathological cases and don't reflect real-world performance. My guess is a 5-10% performance hit on general workloads. Still significant, but nowhere near as bad as some of the numbers that are getting thrown around.

And, databases are the worst case scenario, most real-world applications are showing 1% performance impact or less.

https://www.computerbase.de/2018-01/intel-cpu-pti-sicherheit...

https://www.hardwareluxx.de/index.php/news/hardware/prozesso...

eberkund · on Jan 3, 2018

It's not really a worst case scenario when you consider where the majority of Intel's revenue comes from: selling their high margin server chips for use in data centers, a significant portion of which are running some kind of database.

ricardobeat · on Jan 3, 2018

Why should we trust your guesses over numbers being thrown around?

Your last link is all gaming benchmarks, which as the article mentions are not affected much.

sllabres · on Jan 3, 2018

Another quick postgres estimate [1] with lower impact and a a reply from Linus Torvalds that this values are in the range what they are expecting from the patch. "... Something around 5% performance impact of the isolation is what people are looking at. ..." [2]

[1] http://lkml.iu.edu/hypermail/linux/kernel/1801.0/01274.html [2] http://lkml.iu.edu/hypermail/linux/kernel/1801.0/01299.html

mschaef · on Jan 3, 2018

> syscalls, which, since they aren't usually a huge percentage of processing in the first place... Let's stop the fear mongering

Agreed!

(We should probably also stop overgeneralizing about the nature of computational workloads.)

hinkley · on Jan 3, 2018

Software development workflows have some of the worst syscall profiles out there. This is going to hit most of us where we live.

wyager · on Jan 3, 2018

> I know that bugs happen

This isn’t an excuse for Intel consistently having terrible verification practices and shipping horrendous hardware bugs. From 2015: https://danluu.com/cpu-bugs/ There have been more since then.

I’ve talked to multiple people who work in intel’s testing division and think “verification” means “unit tests”. The complexity of their CPUs has far surpassed what they know how to manage.

zouhair · on Jan 3, 2018

This is typically what happens when you go for a long time without real competition. You get way too comfortable and bad habits start to pile up.

spdy · on Jan 3, 2018

Isn't why this problem even exits the exact opposite? Intel was losing on the mobile market and changed internal testing to iterate faster by cutting corners.

Found a quote:

"We need to move faster. Validation at Intel is taking much longer than it does for our competition. We need to do whatever we can to reduce those times… we can’t live forever in the shadow of the early 90’s FDIV bug, we need to move on. Our competition is moving much faster than we are".

kabdib · on Jan 3, 2018

Man, you should see the errata for some ARM-based SOCs. It's amazing that they work at all.

Vendor, in conversation: "We're pretty sure we can make the next version do cache coherency correctly."

Me (paraphrased): "Don't let the door hit you in the ass on the way out."

Management chain chooses them anyway, I spend the next year chasing down cache-related bugs. Fun.

djsumdog · on Jan 3, 2018

ARM is such a shitstorm. At least the PC with UEFI is a standard. With every ARM device, you have to have a specialized kernel rom just for that device. There have been efforts made on things like PostmarketOS, but still in general, ARM isn't an architecture. It's random pins soldered to an SoC to make a single use pile of shit.

madez · on Jan 3, 2018

Why is it an issue to need a different kernel image for each device? I don't see a problem as long as there is a simple mechanism to specify your device to generate the right image. It's already like that with coreboot/libreboot/librecore, and it worked just fine for me.

kabdib · on Jan 4, 2018

Imagine that you are the person leading the team that's making an embedded system on an ARM SOC. It's not Linux, so you have your own boot code, drivers and so forth. It's not just a matter of "welp, get another kernel image." You're doing everything from the bare metal on up.

(I should remark that there are good reasons for this effort. Such as: It boots in under 500ms, it's crazy efficient, doesn't use much RAM, and your company won't let you use anything with a GPL license for reasons that the lawyers are adamant about).

So now you get to find all the places where the vendor documentation, sample code and so forth is wrong, or missing entirely, or telling the truth but about a different SOC. You find the race conditions, the timing problems, the magic tuning parameters that make things like the memory controller and the USB system actually work, the places where the cache system doesn't play well with various DMA controllers, the DMA engines that run wild and stomp memory at random, the I2C interfaces that randomly freeze or corrupt data . . . I could go on.

It's fun, but nothing you learn is very transferrable (with the possible exception of mistrust of people at big silicon houses who slap together SOCs).

madez · on Jan 4, 2018

The responsibility to document the quirks and necessary workarounds lie with the manufacturer of the hardware. If the manufacturer doesn't provide the necessary documentation, then that's exactly that: insufficient documentation to use the device.

There are hardware manufacturers that are better than others at being open and providing documentation. My minimal level of required support and documentation right now is mainline linux support.

Can you document your work publicly, or is there something I can read about it? I'm very interested in alternative kernels beside Linux.

kabdib · on Jan 7, 2018

> The responsibility to document the quirks and necessary workarounds lie with the manufacturer of the hardware.

When you buy an SOC, the /contract/ you have with the chip company determines the extent and depth of their responsibility. On the other hand, they do want to sell chips to you, hopefully lots of them, so it's not like they're going to make life difficult.

Some vendors are great at support. They ship you errata without you needing to ask, they are good at fielding questions, they have good quality sample code.

Other vendors will put even large customers on a tier-1 support by default, where your engineers have to deal with crappy filtering and answer inane questions over a period of days before getting any technical engagement. Issues can drag on for months. Sometimes you need to get VPs involved, on both sides, before you can get answers.

The real fun is when you use a vendor that is actively hiding chip bugs and won't admit to issues, even when you have excellent data that exposes them. For bonus points, there are vendors that will rev chips (fixing bugs) without revving chip version identifiers: Half of the chips you have will work, half won't, and you can't tell which are which without putting them into a test setup and running code.

sitkack · on Jan 4, 2018

Arm is a problem for all kernels not just Linux in how they map on chip peripherals, etc. All the problems that UEFI solve, are not solved on Arm.

speleo_engr · on Jan 3, 2018

Yep. I've seen scary errata and had paranoid cache flushes in my code as a precaution.

My favorite ARM experience was where memcpy() was broken in an RTOS for "some cases". "some cases" turned out to be when the size of the copy wasn't a multiple of the cache line size. Scary stuff.

HelloNurse · on Jan 3, 2018

Obvious hypothesis: first complacency leads to incompetence, then starting to cut corners has catastrophic consequences. The two problems are wonderfully complementary.

As other comments suggest, there might be a third stage, completely forgetting how to design and validate chips properly.

eximius · on Jan 3, 2018

Or the system was designed poorly to begin with and now you're stuck with the design for backwards compatibility reasons.

HelloNurse · on Jan 3, 2018

I'd expect engineers that are aware of such serious bugs to spit on the grave of backwards compatibility. After all, the worst case impact would be smaller than the current emergency patches: rewriting small parts of operating systems with a variant for new fixed processors.

mtgx · on Jan 3, 2018

I think that could also have been the "official reason".

The same reason could have been used to give the NSA some legroom for instance, but tell everyone that's why they won't do so much verification in the future.

_0w8t · on Jan 3, 2018

This implies that ARM vendors do less validation. I guess ARM is just so much simpler that good enough validation can be done faster. So essentially this is payback time for Intel for keeping compatibility with older code and simpler to program architecture (stricter cache coherence etc.). It is like one can only have 2 of cheap, reliable, easy-to-program.

pkaye · on Jan 3, 2018

I'm sure ARM vendors have their own problems... it is just that they tend to be used in application specific products so the bugs are worked around. Having come from a firmware background I've worked are tons of ugly workarounds for serious bugs in validated hardware.

Furthermore, I just a read an article (can't find the link) that certain ARM Cortex cores have this same issues as Intel.

lmm · on Jan 4, 2018

> This implies that ARM vendors do less validation. I guess ARM is just so much simpler that good enough validation can be done faster.

More likely "good enough" is much lower because ARM users aren't finding the bugs. The workloads that find these bugs in Intel systems are: heavy compilation, heavy numeric computation, privilege escalation attackers on multi-user systems. Those use cases barely exist on ARM: who's running a compile farm on ARM, or doing scientific computation on an ARM cluster, or offering a public cloud running on ARM?

leoc · on Jan 3, 2018

Where’s that quote from? ISTR reading it (or something very similar) as reported speech in a HN comment.

Overall it’s a depressing story of predictable market failure as well as internal misbehavior at Intel, if true. Few buyers want to pay or wait for correctness until a sufficiently bad bug is sufficiently fresh in human memory. And if you do want to, it’s not as if you’re blessed with many convenient alternatives.

deeth_starr_v · on Jan 3, 2018

The quote is from the link above (referencing an anonymous reddit comment).

mannykannot · on Jan 3, 2018

That is a very interesting perspective, and as far as I know it is correct, though perhaps Intel's situation in the mobile market was exacerbated by complacency?

djsumdog · on Jan 3, 2018

There are people looking to deploy ARM servers now. However I wish there had been more server competition. Many companies write their backend services in Python, JVM (Java/Scala/Groovy), Ruby, etc. Stuff that would run fine on Power, ARM or other architectures. There are very few specialized libraries that really require x86_64 (like ffmpeg and video-transcoding)

astrange · on Jan 4, 2018

ffmpeg works great on ARM. I don't know if the PPC port is all that optimized lately.

innagadadavida · on Jan 3, 2018

But why do AMD chips not have similar issues? To me it looks like Intel tried to micro optimize something and screwed up.

rayiner · on Jan 3, 2018

According to LKML: https://lkml.org/lkml/2017/12/27/2

> The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.

Out-of-order processors generally trigger exceptions when instructions are retired. Because instructions are retired in-order, that allows exceptions and interrupts to be reported in program order, which is what the programmer expects to happen. Furthermore, because memory access is a critical path, the TLB/privilege check is generally started in parallel with the cache/memory access. In such an architecture, it seems like the straightforward thing to do is to let the improper access to kernel memory execute, and then raise the page fault only when the instruction retires.

leoc · on Jan 3, 2018

Maybe the answer lies in Intel’s feted IPC advantage over AMD? Or is it the case that AMD has simply been relatively lucky so far?

amigoingtodie · on Jan 3, 2018

Sounds like Facebook and Youtube, too.

devit · on Jan 3, 2018

It depends on whether it's an attack against HVM hypervisors or not.

If it, like it seems, is just an attack on OS kernels and PV hypervisors, you can simply turn off the mitigation, since nowadays kernel security is mostly useless (and Linux is likely full of exploitable bugs anyway, so memory protection doesn't really do that much other that protecting against accidental crashes, which isn't changed by this).

Even if it's an attack against hypervisors any large deployment can simply use reserved machines and it won't have a significant cost.

Iv · on Jan 4, 2018

More than the lawsuit, it attacks one of the core aspect of Intel's brand: performances. Intel chips are supposed to be faster. Now they are suddenly 30% slower because they carelessly implemented performance features over security ones.

olfactory · on Jan 3, 2018

> all companies paying for computing resources will have to pay roughly 30% more overnight on cloud expenses

Well, if I rent a VPS with x performance, I still expect x performance after this flaw is patched. The company providing the virtual machine will perhaps have to pay 30% more to provide me with the same product I've been getting.

Since most VPS offerings arbitrage shared resources, this will not increase costs of providing VPSes by the full performance penalty.

tgarma1234 · on Jan 3, 2018

But all you are ever getting with vps offerings is a description of the number of cpus and amount of ram and suchlike. I haven't seen vps offerings that say "x" chips yield "y" performance. Granted it is sort of implied that the hardware meets certain expectations but there isn't any guarantee. I was just now reading the TOS for AWS just to check and so far as I can tell they aren't guaranteeing any kind of specific performance.

jk563 · on Jan 3, 2018

Well if you use m5.xlarge instances from AWS you were getting 4 vCPUs for your money. I don't expect you'll now get 5...

olfactory · on Jan 3, 2018

No, but the underlying hardware that perviously hosted two m5.xlarge instances may instead host one M5.xlarge and one M5.medium, so that performance is not degraded.

sbarre · on Jan 3, 2018

Yeah but AWS doesn't guarantee that "2x m5.xlarge" will meet any kind of performance requirements, particularly your own application's, do they?

So you may suddenly find that your own performance requirements, that were previously satisfied by "2x m5.xlarge" are no longer being met by that configuration, and I doubt AWS will just provide you with more resources at no additional charge.

Sean1708 · on Jan 3, 2018

> Well, if I rent a VPS with x performance, I still expect x performance after this flaw is patched.

Are there any providers that state you will get x performance? Most that I've seen say you will m processors, n memory, and p storage but don't make any guarantees about how well those things will perform.

Retric · on Jan 3, 2018

Last I checked Amazon AWS has a virtual processor metric not actual hardware metric. This is most noticeable in their lowest power instances which don't get a full modern CPU core.

alanfalcon · on Jan 3, 2018

If the virtual metric is tied to real performance then it could mean a drop in performance while maintaining the same power rating... It will be interesting to see if vendors directly address this.

m3kw9 · on Jan 3, 2018

Cloud services may not need to worry about the issue depending on the OS the customer choose to use, the patched or non patched version.

ajasmin · on Jan 3, 2018

For the cloud providers it's the security of the hypervisor that's at stake.

_jcwu · on Jan 3, 2018

Why would the OS of the customer matter? The patch would be applied to the kernel of the hypervisor / host OS.

ComodoHacker · on Jan 3, 2018

Forgive me my ignorance, but I fail to see how this is such a big deal. Even 50% performance hit/cost increase would be... bearable, computations are rather cheap today. ML and other intensive calculations aren't done on CPU anyway. It's not like technical progress of our civilization is slowed down by 30% or something...

On the other hand, shrinking Intel's market share due to bad PR and thus adding some competition into the industry could actually foster that progress.

cdoxsey · on Jan 3, 2018

If you run things efficiently you're eaking every ounce of performance out of this hardware. A 30% performance hit means a 30% cost increase.

The bigger issue is for things that don't scale easily. That sql server that was at 90% capacity is suddenly unable to handle the load. Sure that could've happened organically, but now it happens (perhaps literally) overnight for everyone all at once.

Expect a bunch of outages in the next few weeks as companies scramble to fix this.

ClassyJacket · on Jan 3, 2018

"A 30% performance hit means a 30% cost increase."

Just wanna point out that a 30% performance hit means a 43% cost increase.

speleo_engr · on Jan 3, 2018

Yes. This is so often forgotten when talking about stock prices (which those 2x or 3x daily derivatives are so dangerous).

For those confused: the math here is a 30% decrease puts you at 70%. To go from 70% back to 100%, 30% only gets you to 91% (0.70*1.3). 1/0.7 = 1.43 means you need 43% to recover.

mkagenius · on Jan 3, 2018

Should the individual companies hurry to patch it. There is no news of exploit as such.

rkeene2 · on Jan 3, 2018

There is now !

mkagenius · on Jan 4, 2018

Intel CEO added - "But when you take a look at the difficulty it is to actually go and execute this exploit — you have to get access to the systems, and then access to the memory and operating system — we're fairly confident, given the checks we've done, that we haven't been able to identify an exploit yet."

It seems you need root or physical access to the system as a prerequisite for the attack.

ubernostrum · on Jan 4, 2018

You don't need root, and you don't need physical access. For Meltdown, you only need the ability to run your own code on the target machine.

Where that gets tricky is when everyone's using cloud hosting solutions where the physical machines are abstracted away, and a given physical server may be running multiple virtual servers for different customers.

Think of it like this:

* Somewhere in a data center at a cloud provider is a physical server, wired up in a rack..

* That server runs virtualization software, allowing it to host Virtual Server 1, Virtual Server 2, and Virtual Server 3.

* Virtual Server 1 belongs to Customer A. Virtual Servers 2 and 3 belong to Customer B.

* Normally, Virtual Server 1 can't access any memory allocated to Virtual Servers 2 and 3.

* BUT: Customer A can now use Meltdown to read the entire memory of the physical server. Which includes all the memory space of Virtual Servers 2 and 3, exposing Customer B's data to Customer A.

That's the threat here.

djsumdog · on Jan 3, 2018

Have you worked in a company where you've hit CPU performance limits. At my last job, we'd have some services run in 25 containers in parallel and we'd have to optimize as much as we could for performance bottlenecks. We'd literally get thousands of assets per minute some mornings, and had a ton of microservices to properly index tag, thumbnail and transcode them.

Our ElasticSearch nodes all had 32GB of ram and we had 10 of them and they were all being pushed to the max.

Something like this would be a massive hit, requiring a lot more work into identifying new bottlenecks and scaling up appropriately.

cheeze · on Jan 3, 2018

I think you're vastly underestimating the potential impact to cloud providers. Azure/AWS/GCP all definitely have extra capacity, but they have forecasting down to a science. Requiring even 10% more capacity is quite a large undertaking alone.

avereveard · on Jan 3, 2018

Even the non provider side of google will see some impact and even 5% datacenter increase won’t happen overnight

jrochkind1 · on Jan 3, 2018

Best summary I've found for the somewhat technical but not hardware-or-low-level-hacker reader is arstechnica. https://arstechnica.com/gadgets/2018/01/whats-behind-the-int...

noxecanexx · on Jan 4, 2018

My head is still spinning writing an OS is a BIG DEAL!!!!

rayiner · on Jan 3, 2018

Can someone help me understand why this is such a big deal? This doesn’t seem to be a flaw in the sense of the Pentium FDIV bug where the processor returned incorrect data. It doesn’t even seem to be a bug at all, but a side channel attack that would be almost expected in a processor with speculative execution unless special measures were taken to prevent it. And it doesn’t seem like it can be used for privilege escalation, only reading secret data out of kernel memory. It seems pretty drastic to impose a double-digit percentage performance hit on every Intel processor to mitigate this.

DannyB2 · on Jan 3, 2018

There is this thing called "return oriented programming". You write your program as a series of addresses that are smashed onto the stack through some other type of vulnerability. When the current function returns, it returns to an address of your choosing. That address points to the tail end of some known existing function, such as in the C library and other libraries. When the tail end of that function returns, it executes your next "instruction" which is merely the next return address on the stack.

The first "instruction" of your program is the last address on the stack, in the list of addresses you pushed to the stack.

You are executing code, but you did not inject any executable code, you did not need to modify any existing code pages (which are probably read only), you did not need to attempt to execute code out of a data page (which is probably marked non executable).

Address Space Layout Randomization is a way to prevent the "return oriented programming" attack. When a process is launched, the address space is randomly laid out so that the attacker cannot know which address in memory the std C lib printf function will be located at -- in this process.

Now let's think about the kernel. If you could know all of the addresses of important kernel routines, you could potentially execute a "return oriented programming" attack against the kernel with kernel privileges. Without modifying or injecting any kernel level code. These hardware vulnerabilities allow user space code to deduce information about kernel space addresses.

Now that's a lot of hoops to jump through in order to execute an attack. But there are people prepared to expend this and even more effort in order to do so. Well funded and well staffed adversaries who would stop at nothing in order to access more and better pr0n collections.

rayiner · on Jan 3, 2018

Thanks for the explanation. But I don't understand this part:

> If you could know all of the addresses of important kernel routines, you could potentially execute a "return oriented programming" attack against the kernel with kernel privileges. Without modifying or injecting any kernel level code.

The user <-> kernel transition is mediated (on x86-64) with the SYSCALL instruction, which jumps to a location specified by a non-user writable MSR. How does return-oriented programming work in that case?

valleyer · on Jan 3, 2018

Basically, let's say there's a syscall that takes a user buffer and size and copies it into kernel stack for processing. (This is common.) If you overflow that buffer, you can overwrite the return address in the kernel stack, which you can then launch into ROP.

userbinator · on Jan 3, 2018

If you overflow that buffer, you can overwrite the return address in the kernel stack, which you can then launch into ROP.

The crucial point here being that there must already be an existing overflow vulnerability in the kernel. Knowing all the addresses is no use if you can't force execution to go to them.

rincebrain · on Jan 3, 2018

The hypothesis I've seen, and why people seem to be rushing to patch it without explaining, is that you might be able to not only leak addresses, but actual data, from any ring, into unprivileged code, at which point, your security model is burned to the ground.

AIUI, the present circumstances are:

- there exists a public PoC from some researchers of side-channel leaking kernel address information into userland via JavaScript which may be unrelated

- there exists a Xen security embargo that expires Thursday that might be unrelated

- AWS and Azure have scheduled reboots of many things for maintenance in the next week, which seems unlikely to be unrelated to the Xen embargo

- a feature that appears to be geared toward preventing a side-channel technique of unknown power has been rushed into Linux for Intel-only (both x86_64 and ARM from Intel)

- a similar class of prevention technique has been landed in Windows since November for both Intel and AMD x86_64 chips (no idea about ARM)

- the rush surrounding this, and people being amazingly willing to land fixes that imply a 5-30% performance impact, strongly suggest that unlike almost every major CPU bug in the last decade, you can't fix or even work around this with a microcode update for the affected CPUs, which is _huge_. The AMD TLB bug, the AMD tight loop bug that DFBSD found, even the Intel SGX flaws that made them repeatedly disable SGX on some platforms - all of them could be worked around with BIOS or microcode updates. This, apparently, cannot. (Either that or they're rushing out fixes because there's live exploit code somewhere and they haven't had time to write a microcode fix yet, but O(months) seems like they probably concluded they outright can't, rather than haven't yet.)

rincebrain · on Jan 3, 2018

Addendum for anyone still reading:

- Intel issued a press release saying they planned to announce this next week after more vendors had patched their shit, which lends me more cause to believe that the Xen bug might be the same one [1]

- Intel claims in the same PR that "many types of computing devices — with many different vendors’ processors" are affected, so I'll be curious to see whether non-Intel platforms fall into the umbrella soon

- macOS implemented partial mitigations in 10.13.2 and apparently has some novel ones coming up in 10.13.3 [2]

- someone reasonably respected claims to have a private PoC of this bug leaking kernel memory [3]

- ARM64 has KPTI patches that aren't in Linus's tree yet [4] [6] ([6] is just a link showing the patches from 4 aren't in Linus's tree as of this writing)

- all the other free operating systems appear to have been left out of the embargoed party (until recently, in FBSD's case), so who knows when they'll have mitigations ready [5]

- So far, Microsoft appears to have only patched Windows 10, so it's unknown whether they intend to backport fixes to 7 or possibly attempt to use this as another crowbar to get people off of XP 2.0

- Update: Microsoft is pushing an OOB update later today that will auto-apply to Win10 but not be forced to auto-apply on 7 and 8 until Tuesday, so that's nice [7]

[1] - https://newsroom.intel.com/news/intel-responds-to-security-r...

[2] - https://twitter.com/aionescu/status/948609809540046849

[3] - https://twitter.com/brainsmoke/status/948561799875502080

[4] - https://patchwork.kernel.org/patch/10095827/

[5] - https://lists.freebsd.org/pipermail/freebsd-security/2018-Ja...

[6] - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

[7] - https://www.theverge.com/2018/1/3/16846784/microsoft-process...

johncalvinyoung · on Jan 3, 2018

https://security.googleblog.com/2018/01/todays-cpu-vulnerabi... https://googleprojectzero.blogspot.com/2018/01/reading-privi...

Seems that Google/Project Zero felt the need to go ahead and break embargo. Worth adding to the above list of news sources.

victorhooi · on Jan 4, 2018

No, that's not accurate.

If you read the article you quoted:

> We are posting before an originally coordinated disclosure date of January 9, 2018 because of existing public reports and growing speculation in the press and security research community about the issue, which raises the risk of exploitation. The full Project Zero report is forthcoming (update: this has been published; see above).

Just from public Gooogling, I believe it may have been the Register who tried to get in on the scoop, and broke the embargo:

https://www.theregister.co.uk/2018/01/04/intels_spin_the_reg...

lmm · on Jan 4, 2018

No-one necessarily broke the embargo. A blogger noticed unusual activity around a certain linux patchset and put two and two together, and the register mostly sourced from his article ( http://pythonsweetness.tumblr.com/post/169166980422/the-myst... )

johncalvinyoung · on Jan 3, 2018

Also from the P0 blog post: Variant 1: bounds check bypass (CVE-2017-5753) Variant 2: branch target injection (CVE-2017-5715) Variant 3: rogue data cache load (CVE-2017-5754)

My checking doesn't show any of those three explicitly listed in Apple's security updates up through 10.13.2/2017-002 Sierra.

https://support.apple.com/en-us/HT201222

j_s · on Jan 3, 2018

Thanks for summarizing. Does anyone have time to link to more on the "side-channel leaking kernel address information into userland via JavaScript" ?

agency · on Jan 3, 2018

This isn't exactly that, but here[1] is a talk linked in the post from the other day which shows a PoC breaking ASLR in Linux from JavaScript running in the browser, via a timing attack on the MMU. There's a demo a half hour in.

EDIT: This post[2] discusses the specific speculative execution cache attack and claims there is a JavaScript PoC (but doesn't cite a source for that claim)

[1] https://www.youtube.com/watch?v=ewe3-mUku94

[2] https://plus.google.com/+KristianK%C3%B6hntopp/posts/Ep26AoA...

rincebrain · on Jan 3, 2018

[1] was what I was referencing, thank you.

Also, RUH-ROH. https://twitter.com/brainsmoke/status/948561799875502080

zvrba · on Jan 3, 2018

More importantly, it also switches stacks so user-mode code cannot modify the return addresses on the kernel's stack.

turbografx16 · on Jan 3, 2018

Everything you've said is right, but I'll expand a little more because ROP is fun.

ASLR, PIC (position independent code: chunks of the binary move around between executions), and RELRO (changing the order and epermissions of an ELF binaries headers: a common ROP pattern is to set up a fake stack frame and call a libc function in the ELFs Global offset table) are all mitigations against ROP, but none solve the underlying problem.

The reason ROP exists is that x86-64 use a Von Neumann architecture, which means that the stack necessarily mixes code (return addresses) and data. The only true solution is an architecture that keeps these stacks separate, such as Harvard architecture chips.

As for bypassing the aforementioned mitigations...

ASLR: Only guarantees that the base address changes. Relative offsets are the same. So to be able to call any libc function in a ROP chain, all you need is a copy of the binary (to find the offsets) and to leak any libc function address at runtime. There are a million ways for this data to be leaked, and they are often overlooked in QA. Once you have any libc address, you can use your regular offsets to calculate new addresses.

PIC: haven't yet dealt with it myself, but you can use the above technique to get addresses in any relocated chunk of code, but I think you'll need to leak two addresses to account for ASLR and PIC.

RELRO: This makes the function lookup table in the binary read only, which doesn't stop you from calling any function already called in the binary. Without RELRO, you can call anything in libc.so I think, but with RELRO you can only call functions that have been explicitly invoked. This is still super useful because the libc syscall wrappers like read() and write() are extremely powerful anyway. Full RELRO (as opposed to partial RELRO) makes the procedure linkage table read only as well, which makes things harder still.

If this is the kinda thing that interests you, I heartily recommend ropemporium.com which has a number or ROP challenge binaries of varying difficulty to solve. If you're not sure where to start, I also wrote a write-up for one of the simpler challenges [1] that is extremely detailed, and should be more than enough to get you started (even if you have me experience reversing or exploiting binaries)

Disclaimer: I'm just some dipshit that thinks this stuff is fun, if I've made a mistake in the above please let me know. I also haven't done any ROP since I wrote the linked article, so im probably forgetting stuff.

[1] https://medium.com/@iseethieves/intro-to-rop-rop-emporium-sp...

yeukhon · on Jan 3, 2018

> If you could know all of the addresses of important kernel routines

Are those kernel logical addresses?

jshelly · on Jan 3, 2018

My BTC wallet feels more vulnerable then ever

lmm · on Jan 3, 2018

If you can read kernel (and hypervisor) memory then it seems like a very small step from that to a local root vulnerability - e.g. forge some kind of security token by copying it. There's an embargoed Xen vulnerability that may be related to or combine with this one to mean that anyone running in a VM can break out and access other VMs on the same physical host. That would be a huge issue for cloud providers.

throwaway613834 · on Jan 3, 2018

> If you can read kernel (and hypervisor) memory then it seems like a very small step from that to a local root vulnerability - e.g. forge some kind of security token by copying it.

This seems very wrong. I'm not aware of any privilege isolation in Windows relying on the secrecy of any value. Security tokens have opaque handles for which "guessing" makes no sense. Are you aware of anything?

Pharaoh2 · on Jan 3, 2018

I can think of a few ways to get privilege escalation if you already have rce as unprivileged user:

1. Read the root ssh private key from the openssh deamons kernel pages maintaining the crypto context and ssh into the system

2. Read a sudo auth key generated for someone using sudo and then use that to run code as a root user

3. Read the users password's whenever a session manager asks the users to reauth

4. If running in AWS/GCP inside a container/vm meant to run untrusted code, read the cloud provider private keys and get control on account

5. RCE to ROP powered privilege escalation exploit seems reasonable...

6. Rowhammer a known kernel address (since you can now read kernel memory) to flip some bits to give you root

Also remember running JS is basically RCE if you can read outside the browser sandbox, ads just became much more dangerous...

throwaway613834 · on Jan 3, 2018

Thanks! I see. So it seems like the program basically has to capture sensitive data while it is in I/O transfer (and hence in kernel memory) just at the right time, right? Which is annoying and might need a bit of luck, but still possible.

Incidentally, this seems to indicate that zero-copy I/O is actually a security improvement as well, not just a performance improvement?

Pharaoh2 · on Jan 3, 2018

4,5 and 6 don't need to time the attack.

I am not really sure how/if zero copy may/may not solve this problem.

If this bug only allows reading kernel pages, zero copy may actually help if the unprivileged user can't read your pages, but from the small amount of available description it looks like it can read any page, but kernel pages are more interesting because thats a ring lower and which is why all the focus is on that.

I am fairly certain there is more protection against being able to read memory owned by process on a lower ring level so zero copy may be a bad idea for security critical data.

And based on the disclosure that google published, looks like any memory can be read

mikeash · on Jan 3, 2018

If “reading secret data out of kernel memory” translates into “read the page cache from a stranger’s VM that happens to be on the same cloud server” then this could be worse than Heartbleed.

0x0 · on Jan 3, 2018

Or maybe random javascript in the browser can stroll upon your ssh private key in the kernel's file cache... and so on.

mikeash · on Jan 3, 2018

Excellent point, I didn't think about the implications for stuff like JavaScript.

rdtsc · on Jan 3, 2018

The privilege escalation is being fixed in software. The problem is that mitigation involves patching the kernel and that patch results in around 30% slowdown for some applications like databases or anything that does a lot of IO (disk and network). That's the big deal. Imagine you are running at close to full capacity after security fix reboot your service might tip over. It could mean a direct impact to cost and so on.

jcadam · on Jan 3, 2018

Oh good, I put my SaaS (running mostly on Linode) up yesterday, then this happens. Can't wait for Linode to apply this patch to their infrastructure :(

I'm cursed when it comes to timing. It's like when I bought that house in 2007, held onto it waiting for the market to recover, then tried to sell it only to find out my tenants had been using it to operate a rabbit-breeding business for years and completely trashed the place (thank you, useless property manager), forcing me to sell it at a loss anyway (6 months ago).

Also, I hate rabbits now. And I veered off topic, sorry.

coolspot · on Jan 3, 2018

You might try luck to sell this as comedy/drama movie script. :)

rdtsc · on Jan 3, 2018

> Also, I hate rabbits now. And I veered off topic, sorry.

Well I guess you're not the right person to without about a great ninja-rockstar position at our new RaaS startup.

/one has to joke sometimes to avoid crying over taking a 30% hit in costs... over a stupid CPU bug

jcriddle4 · on Jan 3, 2018

I would love to see some SQL Server benchmarks on this patch

marcolussetti · on Jan 3, 2018

SQL Server license disallows publishing of the results of benchmarking (much like Oracle does)

dvlsg · on Jan 3, 2018

Wait, really? That's kind of messed up.

walterbell · on Jan 3, 2018

Remarkable that no throwaway HN accounts considered that a challenge.

chillydawg · on Jan 3, 2018

Likely very similar to the postgres benchmarks. Fundamentally a RDBMS needs to sync each transaction commit to the log file on disk and that sync is always a syscall. If your DB is doing thousands tx/sec to low latency flash and you rely on that low latency, you're going to get hit.

anarazel · on Jan 3, 2018

Note that the postgres benchmark numbers passed around (most based on my benchmarks) are readonly. For write based workloads the overhead is likely going to be much smaller in nearly all cases, there's just more independent workload. The overhead in the readonly load profiles comes near entirely from the synchronous client<->server communication, if you can avoid that for workloads (using pipelining, other batching techniques) the overhead's going to be smaller.

pcwalton · on Jan 3, 2018

Reading secret data out of kernel memory is very bad on cloud environments. Keep in mind that the kernel deals with a lot of cryptography.

justinclift · on Jan 4, 2018

Sounds like reading HTTPS cert/key details from other-peoples-VM's on cloud providers wouldn't be too much of a stretch. Especially with the memory dumping demo. Combine that with something that looks for the HTTPS private key flag string and it's sounding pretty feasible. :/

twoodfin · on Jan 3, 2018

Is there anything this bug can give you that you can't get with

    sudo cat /dev/mem

?

I'm having a hard time understanding why this is worse than any other local root escalation bug except for the consequences of the necessary patch.

EDIT: I see that /dev/mem is no longer a window on all of physical RAM in a default secure configuration. Is it true that there's no way for root to read kernel memory in a typical Linux instance? If so, the severity of this issue makes more sense to me.