Hacker News new | past | comments | ask | show | jobs | submit login
OpenSSL CVE-2016-0799: heap corruption via BIO_printf (guidovranken.wordpress.com)
137 points by 0x0 on Feb 28, 2016 | hide | past | favorite | 80 comments



This is one of those "let's ignore malloc failing" bugs. This is pretty common in C code and is, IMHO, a common "worse is better"[1] anti-pattern, which is predicated on the "worse is better" principle that correctness can be sacrified if it makes things too complicated. Many programmers, unfortunately, don't give a crap about things that far off the happy path, and when the language implements checked exceptions or Option types they get upset that it forces them to think about those distracting corner cases. Such is the state of software engineering.

1. https://en.m.wikipedia.org/wiki/Worse_is_better


> Such is the state of software engineering.

Yes, as Hoare stated in 1981, regarding Algol design:

"The first principle was security: The principle that every syntactically incorrect program should be rejected by the compiler and that every syntactically correct program should give a result or an error message that was predictable and comprehensible in terms of the source language program itself. Thus no core dumps should ever be necessary. It was logically impossible for any source language program to cause the computer to run wild, either at compile time or at run time. A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."


"Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to - they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. "

I've always liked that part. Once they saw the problems, they'd put up with whatever sacrifices it took to avoid them.


How did the market for Algol turn out?


Except for C, Objective-C(++) and C++, all other programming languages, even systems programming ones do bounds checking.

And at least in C++, thanks to the STL or similar library there is the option to always enable bounds checking.

So even if the market didn't got hold of Algol, except for C and languages that offer copy-paste compatibility with it, all other ones do offer sane defaults in terms of security.


Ada, Eiffel, Java, C#, and Go are doing fine. All making money. :P Three even used in operating systems with two in embedded systems. So, can be fairly efficient, too.

Note: Not to mention Unisys still makes tons of money off their ALGOL machines from Burroughs. IBM & Fujitsu still service PL/1. So, old ones too.


The project I work on, Mesa3D, routinely ignores malloc failures for a couple of reasons: no one knows how to reasonably test the additional code, and often no one has any idea what sort of graceful failure is even possible when malloc fails.

We've had some cargo culted patches to call _mesa_error_no_memory() when malloc fails, but it was recently noted that this can't possibly work because it internally calls fprintf, and there's no guarantee that fprintf doesn't call malloc! In fact, glibc's implementation does.

Suggestions for best practices welcome.


The proper way to deal with a memory allocation failure, when a graceful return isn't possible, is to call abort() or similar, instead of trying to continue with a potentially inconsistent state.

Of course, it would be better to avoid getting into a situation where either a graceful return isn't possible, or poorly-tested recovery code has to be run to deal with the failure. I read somewhere that seL4's strategy is to divide the work in two phases. The first phase allocates all the resources it will need and checks all preconditions, but changes no state; the second phase has no allocations, does all the work, and will never fail. That way, any error recovery is a simple release of the resources it had already allocated.


Check out the concept of fault injection -- you can wrap all of the malloc() calls and the systematically cause each one to fail. You might want to combine that with valgrind to make sure you don't end up doing undefined things after a malloc failure.

You can also use that wrapping technique to test _mesa_error_no_memory(), by setting the state to "all mallocs fail" and then calling it. Hint: Don't call fprintf, don't use stdio, call write() directly.

Fault injection can't cover all of the combinations of all inputs and all places to fail, but it's a lot better than nothing.


If you don't have anything better to do, just abort().

One (naughty) option is to malloc a decent sized buffer at startup (50MB say), and then when malloc fails free it, and immediately display a warning box.

One most modern 64-bit systems, malloc will never fail anyway, you'll get killed due to memory overcommit.


That's not so hotso for a library though, which Mesa3D is.


The alternative, if you have no good way of testing the codepaths, or if you're running on a system with overcommitted VM, is basically segfaulting. And you're running on a system with overcommitted VM.

An abort with a reason is a far better choice.


Overcommit is not actually relevant. An overcommit OOM-kill is a manifestation of the administrator killing the process for being a memory hog (it's just that the administrator has delegated this decision to some less-than-scrutable kernel code).

Library code should in general punt out-of-memory conditions back to the caller.


> Overcommit is not actually relevant.

It is if you ever want to exercise the code paths. Overcommit means you never get an out of memory error. Your process just dies. Kind of like it would if you just aborted.


When you're running tests in development you can easily set up a test environment with and without overcommit enabled (and in fact there are other ways to limit allocated memory on a per-process basis that work even with overcommit, such as with rlimits).

...but that doesn't even really matter, because in order to fully exercise those code paths you need precise control over exactly which call to malloc() returns NULL, which in practice means you need to interpose malloc() with a debugging framework to force a NULL return.


> One most modern 64-bit systems, malloc will never fail anyway

Except if a virtual memory resource limit is set ("help ulimit").

  $ sh -c 'ulimit -S -v 1000  ; perl -e 1'
  Segmentation fault
  $ sh -c 'ulimit -S -v 2000  ; perl -e 1'
  perl: error while loading shared libraries: libm.so.6: failed to map segment from shared object: Cannot allocate memory
  $ sh -c 'ulimit -S -v 20000  ; perl -e "qw{a}x100000000"'
  Out of memory!


You have no idea what most admins have set their overcommit policy to.


If you can't fail gracefully, then crash as soon as possible. Trying to sputter along is just asking for worse problems.


(these kinds of) Crashes can be manipulated, though. If you can't restore a stable state, you have created instability. Failing gracefully in this context means failing without creating instability or insecurity.


An abort()/__builtin_trap/etc shouldn't generally be manipulable.


The failure mode is the one where you terminate in the middle of an operation that was expected to be atomic or ordered and then the program restarts and does step two without finishing step one.


You have to protect against that anyway, otherwise you're vulnerable to power cuts.


And operating system crashes, and being killed by the OOM handler (which is probably likely if you're out of heap), and all of the other things that can kill a process without warning.


Well, considering malloc gives a NULL pointer on failure, a crash at some point is pretty clearly inevitable.


Sadly it doesn't on linux due to overcommit. You can actually get the failure much later when you try to use the memory. The exception is when you try to allocate absolutely huge buffers. So malloc returning a valid pointer doesn't indicate success (though it returning NULL does indicate failure).


Yes, but it becomes harder to tell where in the code that happened. If you run abort() or the like, you've got more information as to the circumstances of the crash.


It is quite possible to write to files, stdout, etc without using any heap memory. It can be achieved with the bare bones read/write syscalls. I'm not sure I would consider this a "best practice", but it's worth noting. Boehm GC actively uses read when it needs to read files without allocating any heap memory.


In general, for a library, this is not good practise. For an application, it may work. You have to be very careful to not use any formatted output.


> ... no one knows how to reasonably test the additional code ...

wouldn't a simple 'ulimit -m' set appropriate limits on memory for the process ? and then use that to simulate failures, and hopefully see what breaks, fix it, and rinse-lather-repeat ?

edit-1: this is ofcourse predicated on the fact that you are using a unix like system.


A Unix-like system that's not Linux, mind, since ulimit -m hasn't worked there in a long time.


> the "worse is better" principle that correctness can be sacrified if it makes things too complicated

To be clear, in systems designed with the "worse is better" approach, responsibility for correctness can be pushed up the stack, from e.g. kernel programmers to user-space programmers, if doing the Right Thing lower in the stack is hard. The example given in the paper is a syscall being interrupted -- in unix, it just fails with EINTR, and it's the application author's responsibility to retry. (I agree that malloc returning null is similar.)

But worse is better isn't saying that programs shouldn't be correct if it's complicated; it's just a question of where the responsibility for correctly handling API edge cases should lie.

Moreover, if one reads the "worse is better" paper carefully, you'll notice he doesn't say that "worse is better" actually is better, overall. The argument is only that it has better "survival characteristics". Meaning, as I read it, that they can gain and maintain momentum against competitors because they can move faster.

Which, tangentially, is all very interesting to me, because the three modern systems with the best survival characteristics are unix, C, and the web, the latter of which is definitely not worse is better. (However bad you may think the web is as an API, "worse is better" is a design philosophy about giving users simple, low-level APIs, whereas the web is all about high-level APIs where the browser implements lots of logic to prevent you from shooting yourself in the foot.) So maybe we need to reconsider the conclusions of the worse is better paper in light of the giant counterexample that is the web.


I'd also add Java, C#, Python, ARM (especially ARM64) and Windows NT as examples of success of better-is-better systems. (Windows NT is somewhat less better-is-better though due to Win32 compatibility.)


Maybe Python and ARM. Java was a was Worse is Better compromise trying to bring in features from Right Thing languages. Took lots of marketing to make it. C# similarly done with adoption mainly because Microsoft pushed it w/ tooling. They're both Worse is Better candidates. Just not as bad as some others.

Modula-3 or maybe Eiffel would be closer to Right Thing examples in that category.


You say that, but then, there are examples of terrible security bugs resulting from broken attempts at handling malloc failures, too.

The right answer is, I think:

* Don't add checking code at individual malloc call-sites.

* If you have an allocation regime where you need to do something better than abort in response to failure, don't use malloc directly for those allocations.

* Run your program with malloc rigged to blow up if it fails.


Actually, I'd say that if you're trying to protect against lazy programmers, checked exceptions are worse than unchecked, since they'll just add an empty try/catch around the call, whereas crashing the process would usually be a safer option.


they'll just add an empty try/catch around the call

In an ideal world, someone who did that should be taken out back and shot. In this real world, someone who did that should be immediately walked out the door.

It's astonishing to me just how lazy programmers can be.



There was "fair notice" given last week (and front page'd here on HN) about an upcoming release patching "several" vulnerabilities: https://mta.openssl.org/pipermail/openssl-announce/2016-Febr...

However, they're scheduled for release on Monday the 1st of March - so we might be seeing a few more bugs in the next couple of days.

EDIT: I've spent all morning teasing a friend whose birthday is today about "what if it were tomorrow" (I'm sure they get it every year from every single person they meet), only to then come here and forget all about it in my tally. Go figure. Yes, the 1st of March is Tuesday. Sorry!


Don't look for it on Monday; The 1st of March is Tuesday; 2016 is a leap year.


1st March 2016 is a Tuesday


That it has its own printf implementation, along with many other not-exactly-crypto-related utilities, is a feature creep that I always found distasteful about OpenSSL.


"These are not the patches you are looking for": https://groups.google.com/forum/m/#!topic/mailing.openssl.de...

Clarification from OpenSSL maintainer.


Nice code formatter. If in doubt, escape it thrice:

    _dopr(&hugebufp, ....


That's actually four times - browser unescapes the first one.

    _dopr(&hugebufp, ...


Am I the only one who doesn't understand why it's still considered "okay" to write C for security-dependent code these days? Processors have gotten order of magnitudes faster, speed is no longer the constraining resource, and certainly not one I would trade my security for...


Oh my god what a revelation. I've never seen anyone bring this up before.

Outside of every single thread about C. Everyone at this point knows the shitty side of C.


Don't even need anything that drastic. Just switch the compiler to C++ mode and start using std::unique_ptr instead of malloc/free. Remember to turn off exceptions and RTTI to keep the spirit of C alive in your C++ code!


It's a bit crazy to see again issues related to string manipulation. Why this kind of stuff is not centralized yet?


In the name of portability, they've opted to have low-quality NIHed code instead.


This appears to be jumping the gun just a bit. They normally aim for during the work week in the US?


In case someone's interested in history, related change (return on malloc fail) was done in March 2015 in a general alloc-fail cleanup across the codebase. https://github.com/openssl/openssl/commit/918bb8652969fd53f0...


From reading the description, I get the feeling that this is nowhere near as severe as e.g. Heartbleed, right?

IMHO dynamic allocation is something that is often overused, and the fact that it could fail leads to another error path that might be exploitable, like this one. I don't think it's necessary for a printf-like function to ever use dynamic allocation. (If there are cases where it's necessary, I would love to know more... and perhaps suggest how it could be avoided.)


I arrived to the comments section before the C vs Rust flame war


Another day, another memory violation... the endless supply of ammo dispenses another box of shells for Rust. I don't know if Rust is the future, but I am certain the dangling pointers, use after frees, buffer overflows, double frees, smashed stacks, ignored failures, uninitialized data and all the other horrors of contemporary C/C++ will one day be understood as intolerable.


contemporary C++ doesn't really have these problems if you do RAII


These things can be reduced to nuisance level by discipline. The cost to obviating them by language design is not zero (code size; execution speed; random latencies) so its arguable, different languages, different purposes.


> These things can be reduced to nuisance level by discipline.

Obviously not. We've been doing this for decades and are still making the same mistakes with costly effects. Humans are as much a part of technological systems as the language or platform; continuing to ignore that even educated, disciplined, skilled developers are prone to certain classes of errors -- errors that could be obviated by other components of the system -- is hubris. It is much more feasible to create tools that exclude certain classes of error than to create less fallible developers.

> (code size; execution speed; random latencies)

Rust is not GC'ed, has no inherent reason to be slower than C (as the toolchain matures), and I doubt is more verbose than C code that provides the same safety.


Speak for yourself. A power saw is dangerous; but I'm not going back thanks.

And nothing to say about the performance cost of these infallible tools? They are not free.


> A power saw is dangerous; but I'm not going back thanks.

And now we have power saws that immediately stop when they encounter flesh. Obviously, disciplined users never had to worry about that system, otherwise we would have had a bunch of people missing fingers... like my wood shop teacher in high school. :/

> And nothing to say about the performance cost of these infallible tools? They are not free.

People have brought this up a few times, but I've never gotten an explanation that makes sense. My understanding is that Rust does most of its protections in the compile stage, so while not free, they are a small up-front cost at the compile stage, plus an indeterminate cost at the development stage in possible more complex reasoning about the code, which then pays off over the lifetime of the application. While not free, that's definitely a trade-off that seems to make sense to me.

Or are you talking about some other cost? Can you explain where I'm not understanding your point?


Right, except that Rust is a modern power saw with safety enclosure and C is a rusted chainsaw that only works half of the time.


"only works half the time" is a bit of a stretch. what's your OS kernel written in?

I agree we could do better than C. Everyone here talks about it every time this happens, but does nothing. I'll take battle-hardened C over written-last-week Rust, too. Especially crypto code.


> These things can be reduced to nuisance level by discipline.

Apparently 40 years are not enough to acquire that level of discipline.


My teams have written hundreds of thousands of lines of C++, with only occasional (once a year?) trouble with reuse-after-delete or double-delete. And their are tools to deal with that.

We've enjoyed orders of magnitude performance and latency improvement over scripted or GC'd languages.

Its certainly possible to write competent C++ with a professional team, and clear reasons for doing so.


First of all the issue is mainly with C, not C++.

C++ does provide the necessary features to write safe code, as long as one stays away from "C with C++ compiler" mentality and adopts those language and library features.

As for the tools, they are largely ignored by the majority of enterprise developers, assuming Herb Sutter's presentation at CppCon 2015 is in any way representative (1% of the audience said they were using them).

Which goes in line with my (now) outdated C++ professional experience up to 2006.

Only one company cared to pay for Insure++ (in 2000) and I was probably the only one that cared to have it installed.

Also except for my time at CERN, I never met many top elite C++ developers.

Most were using it as "C with C++ compiler" and in teams with high attrition, leading to situations where no one was 100% sure what the entire codebase was doing.


Don't bother, they haven't learned about writing secure code since Extended Algol (1961), Mesa (early 70's), Modula-2 (1978), Cedar (early 80's), Ada (1983), Oberon (1991),... anyway.


17 years ago it was "Let's rewrite Unix in Perl" ... http://news.slashdot.org/story/99/02/28/1639216/unix-in-perl

I do wish them the best luck, though.


Strange to go back in time.


Aren't Cedar and Oberon are the only languages in that list that are memory-safe (assuming no use of unsafe code)? They both achieved that distinction by adding a garbage collector, so you could generalize and say that nobody has learned about writing secure code since Lisp in 1958.

Also, I'm pretty sure that Oberon was quite a bit earlier than that.


Yes, but the other ones already prevented:

- implicit type conversions errors

- memory allocations with incorrect size

- implicit decay of arrays into pointers

- lack of bounds checking for arrays

- null terminated strings with missing null character

- invalid pointers passed as argument for out parameters

- implicit conversion of integers into invalid enumeration values


Almost all of these are solvable without the added abstraction, code size increase that most languages bring, see dependent types.


Dependent types are great, but I fear not for the average programmer on the street, at least not on the current languages that make use of it.


To be fair, the needed feature to mitigate this specific vulnerability is "throws exception on malloc failure", which at least Ada does (Storage_Error).


Isn't it in practice a "this code wouldn't exist in the first place" situation? (I mean in context of Rust and many others) Either you'd have to go into unsafe{} block, of the buffer would simply be a Vec<>, which oom()'s on allocation issue.

I mean theoretically the issue still exists, because you can run allocation yourself and get back null. But without unsafe this shouldn't be possible.


Rust-the-language does not understand allocation, so it depends on the API that you give to whichever allocation functions.

Rust's standard library aborts on OOM.


That's what I meant - if you go through actual allocators, then of course such bug can exist. As you say <<the needed feature to mitigate this specific vulnerability is "throws exception on malloc failure">>. But it seems to me like any language providing resizable buffers with bound checking in standard library is unlikely to get code like this in the first place. And that's a massive scope reduction.


They might downvote you, but I think your preemption of the flame war is a noble thing.


Looks like they showed up.


But they're isolated here at the bottom.


Responsible disclosure, anyone?


The openSSL project committed the fix and description to their public repo 3 days ago (no clue if intentionally or not, but they didn't remove them). Not much to hide anymore.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: