A Skeleton Key of Unknown Strength (CVE-2015-7547)

tptacek · on Feb 20, 2016

Reading this, you'd think getaddrinfo() was the first glibc vulnerability ever discovered. "Look at how many things are affected! Even sudo! How will we ever detect them all?" Let's hope the author has some benzos on hand before they learn about kernel vulnerabilities.

Look: not only is this flaw not unprecedented, like, at all, but just last year we had a glibc vulnerability in gethostbyname() --- the most common libc DNS resolution function. You may remember the flaw by its brand name, "GHOST".

GHOST did not ruin the Internet or justify what I can best describe as 4000 words of concerned back-patting. And it was exploitable on day one!

Just patch the stupid thing, like everything else, and get on with your life.

I do object strongly to the bit at the end, about how ASLR, NX, CFG, &c are effective only at showing us who the best exploit developers are. Horse. Shit. There are today memory corruption bugs that are not exploitable because of runtime constraints. Moreover, when Linux and OS X bugs turn out to be widely and reliably exploitable, the reason tends to be that they hit something that isn't fully covered by runtime protections.

If you want to argue that critical, exposed network services shouldn't be written in C in 2016, you'll get no argument from me, or, for that matter, 98% of Hacker News. We should definitely get moving on porting stuff like DNS servers to things like Go and Rust.

pcwalton · on Feb 20, 2016

> If you want to argue that critical, exposed network services shouldn't be written in C in 2016, you'll get no argument from me, or, for that matter, 98% of Hacker News.

Not 98% of Hacker News. There are an awful lot of people who think modern C++ is memory safe.

nly · on Feb 20, 2016

C++ isn't 'memory safe', but what it does is give you the ability to write memory and resource safe types, and then lean on the type system as a means of ensuring memory safety.

C doesn't let you do that. It's just incapable, and it's a handicap that leads to bad APIs and bad client code. Take gethostbyname() as a historical example. It returns a pointer to static data, which avoids memory leaks at the cost of being thread-unsafe. Want a fix? use gethostbyname_r()... ok, great, except now you're dealing with a function with six parameters and have even more ways to shoot yourself in the foot. Ok well, they're both deprecated anyway... use getaddrinfo()... just don't forget to use freeaddrinfo() because this function allocates a linked list. So don't forget to check for null pointers as you enumerate it. C++ brings value semantics, iterators, and RAII to table and these problems largely go away.

All this is why C++ is better than C. In C++ you can have memory safety, thread safety, type safety, and a simple API. It's not perfect, and it's not easy, designing APIs never is, but at least it's a remote possibility.

pcwalton · on Feb 20, 2016

> C++ isn't 'memory safe', but what it does is give you the ability to write memory and resource safe types, and then lean on the type system as a means of ensuring memory safety.

No, it doesn't, because it isn't memory safe.

> All this is why C++ is better than C. In C++ you can have memory safety, thread safety, type safety, and a simple API.

No, you can't. C++ is not memory safe. Use after free happens in C++ all the time, because smart pointers do not protect you from dangling pointers and null pointer dereference, among other things.

Instead of copy and pasting examples to show this, I'll just link to other posts of mine from the past two months: https://news.ycombinator.com/item?id=11111987, https://news.ycombinator.com/item?id=11054630, https://news.ycombinator.com/item?id=11055020, https://news.ycombinator.com/item?id=10819501.

nly · on Feb 20, 2016

Move your for-loop in to an algorithm that takes Container by const& and your resize() / iterator invalidation problem becomes a compile error. If you actually need to resize your array while you iterate, then a language like Rust is going to catch your pointer invalidation, but it's not going to make the impossible possible and suddenly make your algorithm do something sensible.

I never claimed C++ statically enforced memory safety. Clearly that can't happen while its a superset of C. I just claim that it's steadily reaching the point where the most concise and elegant code with the best API is the safest.

And I'd argue that's the problem with C. The language is too weak to write safe code concisely, so you end up with too much cognitive load, get lazy, and then end up with sprawling unauditable messes like we see in glibc lately.

dakami · on Feb 21, 2016

I've updated the post with further details.

nkurz · on Feb 20, 2016

Toward the end of the essay there's an excellent and provocative summary of the way forward:

  My concerns are not merely organizational.  I do think we   
  need to start investing significantly more in mitigation 
  technologies that operate before memory corruption has 
  occurred.  ASLR, NX, Control Flow Guard – all of these 
  technologies are greatly impressive, at showing us who
  our greatly impressive hackers are.  They’re not actually 
  stopping code execution from being possible.  They’re
  just not.

  Somewhere between base arithmetic and x86 is a sandbox 
  people can’t just walk in and out of.  To put it bluntly, 
  if this code had been written in JavaScript – yes, really – 
  it wouldn’t have been vulnerable.  Even if this network 
  exposed code remained in C, and was just compiled to 
  JavaScript via Emscripten, it still would not have been 
  vulnerable.  Efficiently microsandboxing individual 
  codepaths is a thing we should start exploring.  What can 
  we do to the software we deploy, at what cost, to actually 
  make exploitation of software flaws actually impossible, as 
  opposed to merely difficult?

bcook · on Feb 20, 2016

if ($password == blah.crap) is insecure in any lang.

What I mean is, we solve one vector, but is always another (logic flaw vs stack flaw). Is there a truly "perfect" way to code? Is there any academic philosophy that is impossible to exploit?

I ask from a very ignorant perspective. I can barely program.

cbd1984 · on Feb 20, 2016

We can eliminate entire classes of attacks by using certain languages and technologies.

Simple, forgotten example: You can't just patch a running OS kernel from an application program anymore. There's hardware in the CPU called an MMU, or Memory Management Unit, which inspects all attempts to access memory, read or write, and checks them against a policy the kernel set which aims to disallow all unsafe memory accesses. (This is like nine kinds of oversimplified, but it's not actually wrong... ) The MMU will alert the kernel if an application program attempts to access memory in a way contrary to policy, and the kernel typically kills the program dead right there. That's what a segfault is.

My point is, prior to the MMU, the only possible response to "Applications can modify running OS kernels willy-nilly." was "Don't do that then." There was no way to enforce that policy. That's why MS-DOS, which ran pretty much exclusively on hardware either without an MMU or with the MMU disabled, had no effective security policy: Any application could modify the only thing attempting to enforce that policy at any time, and nothing could stop it. In the immortal words of Bokosuka Wars, "WOW ! YOU LOSE !"

We take MMUs for granted now. We take OSes which use MMUs for granted now. We no longer have to rely on the care and kind nature of strangers to enforce the basic policy.

The trade-off is speed: Adding an MMU to the path to RAM inevitably makes accessing RAM slower. There's no way around it. We see it as a rock-simple win-win tradeoff that we've almost forgotten that there even is a tradeoff, but our computers would run faster without MMUs. We've just, as a hardware and software culture, decided that it's worth it.

So. The discussion here is, "Which other tradeoffs are worth it?" Because there are other technologies we could adopt, hardware and software and a mix of both, which could completely seal off other classes of attack vectors, and we need to decide which of those technologies are worth implementing.

legutierr · on Feb 20, 2016

Another example, from only seven months ago, is this discussion of how to eliminate XSS attack vectors from the "CommonMark" markdown standard:

http://talk.commonmark.org/t/make-commonmark-safe-by-default...

The proposal being made, as per my understanding, is that by default CommonMark-compliant markdown should be evaluated in a "safe" mode--escaping html and whitelisting url formats--and that compliance with this behavior among interpreters should be enforced by the validation tools provided by CommonMark. This would conceivably result in CommonMark markdown becoming an easy default way to eliminate the possibility of XSS on sites that accept user-input in the form of comments, etc.

Unfortunately, the conclusion to the discussion seems to have been, "although it seems to be possible to trivially eliminate XSS attacks from user input using 'safe' markdown, that's an implementation detail so it's not the job of the standard".

I only found this yesterday, and I'm still trying to understand the current stance the project is taking on this (I don't know if it has changed), but I think it underlines how important--and uncommon--it is for people to adopt an attitude of "if we can eliminate a class of attack through a formalism or technology, let's do it."

If CommonMark were to adopt this approach, it could make the security advice given for processing user input as simple as "use a CommonMark-compliant markdown interpreter in the default 'safe' mode", whereas now it seems to be, "pick the right markdown interpreter that has a 'safe' mode you can trust, be sure to configure it properly, and then for good measure take the computationally expensive step of running your generated HTML through a sanitizer."

bcook · on Feb 20, 2016

Was this exact conversation not had decades ago regarding C & assembly-type languages? Tomorrow, lua vs C, then Go vs java, then spoken language vs whatever. (I surely got the specific comparatives wrong)

Tomorrow we will take terabytes & terahertz for granted.

I see no end. Old-school vs new-school ad-infinitum.

pjmlp · on Feb 20, 2016

Yes, once upon a time C shared a spot with the other systems programmer languages many of them safer than it, and on home-micros it was seen as a "managed language", which many used as a cheap macro assembler via the inline assembly extensions.

Now its compilers are praised for speed, given 30 years of optimization efforts.

tamana · on Feb 20, 2016

The best code is no code. We should work together on coming libraries that can be hardened with time and effort, and minimize how much unique code we write that has new bugs.

SteveNuts · on Feb 20, 2016

Are you sure we shouldn't just rewrite everything in the language du jour every week?

</s>

capitalsigma · on Feb 20, 2016

> Somewhere between base arithmetic and x86 is a sandbox people can’t just walk in and out of....Efficiently microsandboxing individual codepaths is a thing we should start exploring.

wat

The turtles need to end somewhere --- at the end of the day, you are writing something that becomes machine code, and you've lost the benefit of your high-level abstractions. The glibc bug is a perfect example --- glibc is an abstraction layer to make life safer that virtually all (excluding Debian and some other distros, but near enough) Linux code links against.

No matter where you put it, your stack is going to have a layer like that in it somewhere, and sometimes you're going to find bugs in that layer. Maybe next time it's a bug in your C-to-JS compiler that emits insecure code, or a bug in your `microsandboxing framework` that allows RCE. Shit, maybe there's a bug in your CPU architecture that allows RCE.

The glibc bug is an example of the sort of incredibly rare screwup you can't escape no matter how hard you try, and we're fortunate that 'the good guys' caught it before it had a chance to develop in the wild.

dakami · on Feb 20, 2016

What I'm saying is that a lot of energy has gone into "Assuming an attacker has gotten us into an undefined state, let's try to prevent them from pushing us into a chosen redefined state." And what I'm saying is, maybe we can create an environment where we don't end up in undefined states, or at least, there are bounds to how undefined they can be.

For example, I'm exploring ending use after free bugs by just not freeing memory. This sounds ridiculous until you realize that on 64 bit, leaking virtual memory (and therefore never recycling pointers) is actually not an insane idea, particularly for browsers that get to kill processes outright because they feel like it. Also, lots of UaF in there.

tptacek · on Feb 20, 2016

When you indict things like ASLR as being little more than bait for exploit developers, and later suggest that part of the solution might be a hack involving free() create zombie addresses, you give the impression of having said "exploit mitigations aren't working, unless they're my exploit mitigations".

(I also don't think yours is a good plan, but I'll wait for you to publish more details before criticizing it further).

dakami · on Feb 21, 2016

Very specifically, I'm interested in exploit mitigations that eliminate undefined states, rather than just hope an attacker doesn't know enough to redefine them. One can show "zombie pointers" (fine, we've got lots of space in 64 bit land) will never allow an attacker to exploit a UaF much easier than we can show memory is randomized enough.

At the end of the day hard bounds checking (however slow it might be) also falls into this category of "approaches that do not try to survive falling into undefined states". I'm not saying ASLR et al isn't useful, just that we should put more energy intostaying within well defined states.

That's ultimately what "better" languages promise, after all. I'm curious if there are approaches that don't require rewrites, and very interested in actually measuring what does and doesn't absolutely suppress vulnerability, at what performance cost. We're not doing enough of that.

pcwalton · on Feb 20, 2016

> And what I'm saying is, maybe we can create an environment where we don't end up in undefined states

That environment is not going to be programmable in C, unless you want a huge performance hit.

dakami · on Feb 21, 2016

You might be right, you might not be, in the general case. In the specific case getaddrinfo is not performance sensitive (to say the least) and that entire block could be specially compiled 10x slower or run inside of a trivially available sandbox.

We have options we're not exploring.

pjmlp · on Feb 20, 2016

But there is a big difference between using a language that requires you to be explicit about doing unsafe stuff (here be dragons kind of thing) and one that every single code line is a possible exploit, relying on programmer's skill to make it perfect.

chei0aiV · on Feb 21, 2016

Debian uses glibc now and eglibc is just a fork of glibc.

codeisawesome · on Feb 20, 2016

Is patching this bug on any server a matter of running `sudo apt-get update` (or the equivalent of the linux flavor in question) - and then rebooting afterwards?

EDIT: From AWS (https://aws.amazon.com/security/security-bulletins/cve-2015-...):

""" We have reviewed the issues described in CVE-2015-7547 and have determined that AWS Services are largely not affected. The only exception is customers using Amazon EC2 who’ve modified their configurations to use non-AWS DNS infrastructure should update their Linux environments immediately following directions provided by their Linux distribution. EC2 customers using the AWS DNS infrastructure are unaffected and don’t need to take any action. """

0x0 · on Feb 20, 2016

Of course, you should also run "sudo apt-get upgrade" after "update" :)

cbd1984 · on Feb 20, 2016

> Is patching this bug on any server a matter of running `sudo apt-get update` (or the equivalent of the linux flavor in question) - and then rebooting afterwards?

That's how a well-configured server works by definition. How much do you want to bet your server's well-configured?

INTPenis · on Feb 20, 2016

Yes, when I woke up the morning after the advisory all my servers were already patched. I only had to reboot them. Thanks to things like yum-cron and unattended-upgrades.

tie_ · on Feb 20, 2016

First, until you reboot your servers, they are not really patched. Second, you are happy about unattended core system upgrades to production machines? I don't think this is the right feeling to have :)

jlgaddis · on Feb 20, 2016

As a general rule, I would agree with you (with regard to point two). However, you don't know the details of INTPenis' infrastructure so you can't know. Perhaps automated / unattended upgrades / reboots would totally hose your environment but that's not the case in every instance.

INTPenis · on Feb 20, 2016

Depends on the environment, but I can safely say that I allow security updates without reboot in most production environments and have yet to run into problems.

Rough estimate I'd say it's 50/50 whether I patch manually or allow unattended patches.

dakami · on Feb 20, 2016

If this bug ends up traversing caches, it'll affect Amazon just like everything else. That the bug is so resistant to clear-cut answers on cache traversal vulnerability is unique, practically unprecedented.

al2o3cr · on Feb 20, 2016

"The hard truth is that if this code was written in JavaScript, it wouldn’t have been vulnerable."

I'd be very, very careful making broad "interpreted language X would prevent this bug" statements - not only is it dependent on the VM not having remotely-exploitable issues, there's also the matter of VM-to-host leakage. For instance, it's possible to exploit the row-hammer behavior of the host system's DRAM from Javascript:

https://github.com/IAIK/rowhammerjs

dakami · on Feb 21, 2016

No argument from me about the seriousness of issues like Rowhammer. Computers are built in layers, and layers require reduction to logical assumptions. The degree to which Rowhammer destroys logical assumptions is astonishing.

However, we've got lots and lots of common flaws that our present coding patterns aren't quite covering. This exact code written in JS wouldn't have been a problem. It wouldn't have been a problem even if it was just transpiled to JS and kept in a well defined sandbox (which isn't how we're doing sandboxing right now).

FracMat · on Feb 20, 2016

"A network where devices eventually become existential threats is a network that eventually ceases to exist."

I imagine this only if these devices can't be disconnected. Life creates its own existential threats in a lot of parts of the "network", but they are contained or fixed eventually. Diversity is the key, and even though all life we know has the same fundamental building blocks, so far it worked out. The internet is not that fundamentally different. So instead of working on ways to increase maintenance, continue to make the internet a highly diverse place.

yyin · on Feb 20, 2016

Back in 2008 after the cache poisoning hype, I developed my own method of resolving names without using caches (it is very fast); I use only authoritative servers. I still use this method daily.

"... but we can set the tc "Truncation" bit to force an upgrade to the protocol with more bandwidth."

dnsq does not do TCP queries. Sorry.

I also developed a few systems for resolving all the names I needed in advance so I did not need to use DNS at all, except when periodically updating the list of IP addresses. I am glad I did that work. (But nowadays there are resources like scans.io)

When someone publishes a vulnerability in dnsq from djbdns (it does not send recursive requests), I'll have to dream up another solution to the problem of "DNS". I doubt that's going to happen, but I could be wrong.

There are too many other easier targets.

duskwuff · on Feb 20, 2016

1. Hitting authoritative servers for every DNS query and refusing to cache results will make the operators of those servers hate you. You will probably find that some sites (or even potentially entire TLDs!) will end up blocking your requests entirely after a while, as this is an incredibly "unfriendly" behavior.

2. Supporting TCP queries is not optional. Some DNS servers will refuse to answer certain types of queries over UDP. In particular, ANY queries are often TCP-only, as they are a potential vector for DNS amplification attacks.

yyin · on Feb 20, 2016

1. The truth is, I make far fewer queries than the average web user. Because I have the IP addresses I need stored permanently. And I only update those files periodically. Today's websites and graphical web browsers (that I do not use) perform astounding quantities of uneccessary _daily_ or _hourly_ DNS requests that I never make. Maybe you think I am resolving every registered name in existence? If that were true, then yes, I think that is unreasonable. But the fact is I am only resolving the names I need, which, in the context of the total number of names registered, is very, very few. however scans.io and other scanning projects do not seem to be labeled as "unfriendly" nor the target of "hate"; perhaps your views are not based on actual exerience?

2. This is a personal solution. I am not writing software for anyone else. I do not have to use TCP for DNS queries and I have never found an authoritative server that refused to accept a UDP query. dnsq does not do TCP queries; I guess you could complain to the author he's violating some rule? If I am not mistaken, amplification problems happen because of ideas like open resolvers and enormous UDP packets, like those required for EDNS0 and DNSSEC. I am not a user of either of those ideas.

chei0aiV · on Feb 21, 2016

I would encourage you to publish your solution and get it included in various software distributions.