Kernel RCE in iOS/macOS with ICMP

jerf · on Oct 31, 2018

The Ping of Death rides again!

Ok, yeah, not exactly the same, but humor me. It's a Halloween undead miracle!

cesarb · on Nov 1, 2018

The worst part, is that you just known some sysadmins are going to use this as an excuse to block all ICMP, worsening the PMTU blackhole problem; and the "these ICMP bugs were all from the 90s" argument won't work anymore.

jsjohnst · on Nov 2, 2018

Except this isn’t triggered with an ICMP packet. All it takes to trigger the crash is a specially formatted TCP packet.

saagarjha · on Oct 31, 2018

TL;DR: Heap buffer overflow in the network stack allowed for sending ICMP packets that would crash devices. Bug was marked as "RCE" because it's possible that it could be used for code execution, though the current PoC only causes a crash. Bug has been reported to Apple and fixed in iOS 12 and macOS Mojave 10.14. There's an excellent drawing by the author at the end of the author summarizing the exploit ;)

As an aside:

> Even on a Mac, XNU is highly non-trivial to build.

I find it to be an extremely sad state of affairs when it requires scouring the internet for a blog post from a Darwin engineer to build an outdated XNU :(

pcwalton · on Nov 1, 2018

> I find it to be an extremely sad state of affairs when it requires scouring the internet for a blog post from a Darwin engineer to build an outdated XNU :(

Large projects at companies tend to be hard to build, simply because it takes a lot of effort to make the build system for something so complex easy. If community contributions aren't particularly valued (which they aren't for XNU), then it's hard to internally justify spending the effort. Linux hackers are spoiled…

XNU isn't even one of the worst offenders. Try compiling Google's Skia, for example…

lunixbochs · on Nov 1, 2018

Really? I found Skia easier to compile than actually use.

pjmlp · on Nov 1, 2018

One needs to duplicate Google's build infrastructure, which is quite a pain versus other open source projects that rely on more mainstream tooling.

hoaw · on Nov 1, 2018

Which would be fine if they provided some straight forward way to replicate that (like a container). It really seems like open source doesn't have that high of a standing at Google (anymore?). I guess limiting hardware manufacturers use of open source, not spending time making sure other people can build open source code, is what gives you the promotion these days

pcwalton · on Nov 1, 2018

It's probably gotten better lately. A few years ago it took me several days to get it built.

pjmlp · on Nov 1, 2018

So it is not only me.

On my case I ended up giving up and went back to Qt.

slimsag · on Nov 1, 2018

I also found this to be true when I tried to use Skia.

sigjuice · on Nov 1, 2018

I don’t see why compiling XNU needs to be more complicated than 1. install the compiler (easy to do just by typing ‘cc’ on a pristine Mac) 2. Type ‘make’

pcwalton · on Nov 1, 2018

It doesn't need to be that more complicated, but build systems inevitably have a way of needing at least one full time engineer to get them to the point of being that nice to use.

setquk · on Nov 1, 2018

That’s not how software teams work unfortunately.

There’s probably three build systems and none of them work on certain days of the week and the guy who wrote it all left six months ago.

saagarjha · on Nov 1, 2018

For XNU? Doubt it. It's probably all on XBS and has multiple people maintaining the build system.

setquk · on Nov 1, 2018

Sorry I forgot the /s on the end of that.

xnyhps · on Nov 1, 2018

> Heap buffer overflow in the network stack allowed for sending ICMP packets that would crash devices.

It's not receiving an ICMP packet that causes the crash, it's when sending an ICMP error response packet. ICMP error packets can be generated in response to any type of IP packet, for example "host unreachable" when using UDP. These packets copy the original header of the packet which generated the error, which is where the out-of-bounds copy is.

I understand why the author doesn't want to give full details, but in this case it leads to a lot of misunderstandings on how it works.

AndyMcConachie · on Nov 1, 2018

I understand it to be a classic buffer overflow using a memory copy. Bounds checking fails and a buffer is overflowed with bits from the attacker. The author was able to cause a kernel panic. This does not mean another attacker could not execute arbitrary code using this exploit.

I expect to see a PoC soon with arbitrary code execution in kernel space on all unpatched MacOS/iOS devices soon. Which would be bad...

derefr · on Nov 1, 2018

Presumably, internally to Apple, XNU is built mostly on build-servers. Maybe even the branches having CI build-on-push, and test machines able to PXE-boot directly to a given branch's tip build. In such a case, you'd never need XCode to successfully build a standalone XNU at all.

throwabayhay · on Nov 1, 2018

Nah. What you say is partly true, but I and others build xnu locally and incrementally all the time. Fast enough.

throwabayhay · on Nov 1, 2018

If you want more details: Lots of parts are obviously pure C, but in my experience the C++ stuff is usually quite incremental as well. Changing a header included in lots of places is the obvious multiplier to build time.

Linking does take some time, but it’s only really noticeable if you do very incremental changes (e.g. change a single value, build, boot, repeat).

Overall, building a whole kernel from scratch even for multiple configurations is not unbearable during normal development.

TheDong · on Nov 1, 2018

Developers of course never wish to compile a 5 line patch locally and launch a kernel in a VM to test it.

That's something practically every linux kernel dev does, but surely every time a XNU dev wants to add a single debug print statement, they'll go through the whole CI system before being able to run it.

derefr · on Nov 1, 2018

If XNU can't be built incrementally (and, not having built it myself, I'm unsure whether it can, since it contains a large amount of C++), then it's going to take just as long, if not longer, to build it on your workstation, compared to going through the CI pipeline (with the beefy build-boxes.) So there's no disadvantage in a CI-only workflow in such a case.

IIRC Chrome has this workflow for this reason.

TheDong · on Nov 1, 2018

Chromium absolutely does not only build that way.

I occasionally build chromium on my machine, and sure it takes two hours the first time, but incremental builds after that are only on the order of minutes.

C++ code can be incrementally compiled... Sure, it's not as good as in many other languages due to headers being so heavy, but it's still a massive reduction in total build time.

vlovich123 · on Nov 1, 2018

I promise you incremental builds are extremely important. A completely clean build of iOS/OSX years ago would take multiple days & I'm sure it's only gotten worse. The entire internal CI system relies on incremental builds working properly so that they can generate an internal iOS build every day for testing. It's possible it's disabled for certain projects but I doubt it. C++ in no ways alters the incremental build mechanism; all the problems of incremental builds it inherited from C's preprocessor. Speaking of, that's what C++ modules fixes (clang has support for C & C++ there although when I left only C projects had it on).

0x0 · on Nov 1, 2018

Apparently it was actually already fixed in ios 12.0 and macOS 10.14.0 but retroactively added to the security notes for those releases only just now.

saagarjha · on Nov 1, 2018

Thanks, updated.

nixpulvis · on Nov 1, 2018

and yet I always hear the argument, "but it's open source, just like Linux".

saagarjha · on Nov 1, 2018

It is open source, but not like Linux. These are more of source dumps than an actual window into the development process.

pjmlp · on Nov 1, 2018

Another day, another buffer overflow caused by C's memory model.

At least ARM is getting pointer tagging.

int0x80 · on Nov 1, 2018

Caused by Apple copy pasting networking BSD code, missing a crucial bounds check in the process (as noted by another reply ITT).

Here: https://news.ycombinator.com/item?id=18351533

pjmlp · on Nov 1, 2018

Which in a language with bounds checking would usually mean a panic instead of a memory corruption leading to possible exploit.

Unless the same mistake would have been done inside an unsafe code block/module.

If anything, C's history of memory corruption errors and how they come to be, proves that even the best aren't able to follow best practices that apparently all good C coders do.

Linux kernel is full of such CVEs in spite of the approval processes in place to accept new code into the kernel, and I assume Apple does have similar practices in place.

int0x80 · on Nov 1, 2018

Yeah. C gives you little safety nets. But the problem here is a missing check. Do the same (logic mistakes) in Rust, Java, C#, JavaScript etc and you will, sooner or later, give access to your admin account, to your DB, to a command shell or to memory, writting not permitted values by omiting a validation check etc.

I see your point, and I agree partly, some languages can avoid the particular secuence of events in this bug or make it more difficult.

In this case, imagine:

- Sloppiness getting code into the icmp module (this is speculation, to be fair).

- Buffer overrun leading to RCE.

- Solution:

  1 You rewrite in rust because you think the root cause is that C is unsafe.

  2 You don't write sloppy code, higher your standars, more audit/testing/verification.

To me, if you don't fix #2, you are going to have big problems anyway even if you have safer tools.

pjmlp · on Nov 1, 2018

That is the typical defense argument I keep getting from C crowd since the mid-90's.

The thing is,

Σ logic errors < Σ logic errors + Σ memory corruption errors

So while errors don't go away, the exploit area is considerably dimished.

Naturally it gets hand waved as errors are errors.

Which is why I really do look forward for harder lawsuits when security exploits happen.

When money starts speaking, more companies will take security more seriously.

int0x80 · on Nov 1, 2018

Fine. I agree. Less is better. Now, I don't want to have a c vs rust like discussion, I think we will agree in fact mostly.

But, all I was saying in the beginning (and I later digressed) is that just blaming 'C memory model' is not fair.

(Dramatization) "Hey, boss, yes... an RCE, what a pitty. Yeah, missing check. But don't blame me. Is C it is at fault". That is just not realistic/ridiculous. Is like saying is the parents of Denis Ritchie that are at fault here.

pjmlp · on Nov 1, 2018

It is fair blaming C memory model because safer systems programming languages exist since 1961, and C only got where it is thanks for Bell Labs not being able to sell UNIX for the first 10 years of its existence.

Even Multics had less exploits than earlier UNIX versions, as stated on a DoD report about inexistence of memory corruption issues thanks to PL/I.

If you bother to read the early C history document written by Richie, they were aware of such shortcomings and thus lint was born in 1979.

A tool that to this day many devs still ignore.

As for alternative languages I haven't named a single one on this thread. There are plenty to chose from since 1961.

int0x80 · on Nov 1, 2018

Interesting, I will read the article, thanks for mentioning it.

As for alternatives, I was just giving examples, you didn't say anything yes.

walrus01 · on Nov 1, 2018

How long until somebody builds a device that is basically a pwnplug, which joins wifi networks at major tech conferences, does an nmap for live IPs, and sends this to every device?

kiallmacinnes · on Nov 1, 2018

As an aside, well ran conference WiFi (and public WiFi in general) won't allow attendees to contact each other over the WiFi network - they will only allow traffic from end user devices towards the gateway, and won't allow hairpin routing on that gateway.

I wish more public WiFi was well ran, as many of these same network attacks wouldn't have an opportunity to work in the first place.

pjmlp · on Nov 1, 2018

Many public WiFis are managed by people buying some routers at the local store and plugging them in.

toast0 · on Nov 1, 2018

Probably a long time, Apple devices update frequently and in a generally timely fashion -- in a few months there won't be many vulnerable devices left (except for the ones apple won't release updates for)

weinzierl · on Nov 1, 2018

At the Chaos Communication Congress in 2010 (27C3) someone repeatedly crashed all Apple OSX Devices via WiFi.

These things happen.

loeg · on Nov 1, 2018

So like, 30 seconds ago? :)

swang · on Nov 1, 2018

Does this affect all version of iOS before 12? Seems like some old iphones may fall out of use then if they're vulnerable.

Klover · on Nov 1, 2018

Correct. iOS 11 devices can update to iOS 12, older devices are vulnerable.

unixpickle · on Nov 1, 2018

Reproduced the bug in Golang. Pretty simple stuff. https://github.com/unixpickle/cve-2018-4407

petre · on Nov 1, 2018

Is there a fix for older OSX or should I just install Linux on my 2011 iMac? Yosemite is already quite slow. OpenSuSE 13.2 was really fast on a same generation Mac Mini.

galad87 · on Nov 1, 2018

If you are behind a NAT, you won't receive ping from the Internet, so you are safe. And you can enable the firewall and "stealth mode" to make it ignore local pings.

Your 2011 iMac can be update to 10.13, so I would update, a iMac 2011 with enough ram and a ssd works very well even on 10.13.

jsjohnst · on Nov 2, 2018

Except it’s not a received ICMP packet that triggers the issue, it’s a TCP packet with large IP and TCP options. The crash occurs because the XNU kernel is sending an ICMP error in response to that received TCP packet.

etatoby · on Nov 1, 2018

Go ahead. After trying many options I settled on XUbuntu. But any of them will do.

To bad, I liked the older OS Xes.

wepple · on Oct 31, 2018

Can someone explain to me, did I miss the description of the actual bug here? I get there’s a copy, but why is there an under-allocation to start with? Or is that alloc somehow attacker-controlled? Was part of the copy func args controllable? Or was this just an ad?

geogriffin · on Nov 1, 2018

Pretty sure it was introduced here: https://github.com/apple/darwin-xnu/commit/c10988c130af09087...

In this commit a bunch of code was copy-pasted from BSD into ip_icmp.c in icmp_error() to quote IP packet data in icmp error responses. Unfortunately they forgot to copy a line from the BSD source which would prevent the overflow:

  icmplen = min(icmplen, M_TRAILINGSPACE(m) - sizeof(struct ip) - ICMP_MINLEN);

m is allocated with a fixed-length buffer (mbuf), which means without that line above we can overrun it:

  m = m_getcl(M_DONTWAIT, MT_DATA, M_PKTHDR);

Copy-paste programming in core networking code...

sigjuice · on Nov 1, 2018

> Copy-paste programming in core networking code...

There is nothing special about core networking code.

pjmlp · on Nov 1, 2018

Other than being highly security sensitive, written in a language without any regard for security.

geogriffin · on Nov 1, 2018

My thought specifically was that "core networking code" is code that handles untrusted, unsolicited data potentially from the entire world, written in C, with special performance requirements that gives programmers a license to eliminate redundant bounds checks wherever possible..

PeterLGummybear · on Nov 1, 2018

The XNU networking stack is complete garbage in general. Have you ever seen all the crazy hard-coded stuff they have for different types of network mediums -- wifi vs cellular vs etc -- and in the kernel? It's bonkers. Plus a weird evolution of firewalls and ugh.

sigjuice · on Nov 1, 2018

This looks like a mundane bug and there are countless other undiscovered bugs like this. It doesn’t deserve this huge essay IMHO.

tptacek · on Nov 1, 2018

Well, you may not deserve the huge essay, sure.

sigjuice · on Nov 1, 2018

??

Not sure what you are trying to say.

TheDong · on Nov 1, 2018

Remotely exploitable buffer overflows in kernels are fairly rare.

> there are countless other undiscovered bugs like this

How do you know that they exist if they're undiscovered?

At this point, there are probably very few in linux after the level of fuzzing its received.

wahern · on Nov 1, 2018

> At this point, there are probably very few in linux after the level of fuzzing its received.

Linux is heavily developed, which means significant code churn. It's a victim of its own success.

And there's much less fuzzing happening than you think. It's an on-again, off-again activity; not something that any particular organization (AFAIK) is rigorously and repeatedly performing on Linux for any serious length of time (i.e. years). Fuzzing isn't an automatic thing; it requires human intervention to help craft the tests so they reach and tickle the right corner cases. And if the code is constantly changing, you need to constantly review your tests.

Also, syscall fuzzing must be done differently from network fuzzing, for example. And fuzzing kernels in general is more difficult than, e.g., a self-contained library.

If you look at the Linux kernel CVEs, they've been steadily increasing over time: https://www.cvedetails.com/product/47/ We can quibble over the causes and significance, but I don't think those data points support your argument.

Yes, remote exploits in Linux are rare, just as they're rare in most operating systems, including macOS and Windows. But not rare enough. And there's little indication that they're becoming more rare. Unfortunately, I'll bet that the shift to eBPF to implement basic filtering and routing will cause a spike over the next few years.

At this point the Linux kernel is a lost cause. Everybody serious about security understands that the future is moving your most sensitive assets to secure enclaves not running general purpose OSs like Linux. Unfortunately those enclaves are recapitulating the same mistakes--too many features, too flexible, too much emphasis on adding more complexity to try to fix things. But one thing is certain, Linux is definitely not getting any simpler, and people are moving assets to slower moving train wrecks.

TheDong · on Nov 1, 2018

> If you look at the Linux kernel CVEs, they've been steadily increasing over time [...] I don't think those data points support your argument.

The number of CVEs don't support any argument really. More CVEs are assigned, yes. That doesn't mean more security issues. In the olden days of 8 years ago, a buffer overflow would be patched with no fanfare and no CVE assigned. Nowadays, often only theoretically exploitable issues will still be given a CVE.

sigjuice · on Nov 1, 2018

Because these types of bugs are easy to make and miss?

Tibbes · on Nov 1, 2018

That's the interesting thing about the approach used to find this bug (automatic variant analysis): whilst there are no doubt more bugs to find in XNU, there aren't any more bugs like this.

The article says that the problem was found by codifying the mistake that led to a previous CVE as a query in a logic language called QL, and then running that query over XNU, so if there were any more they would have been found at the same time.

(edit: corrected typography)