Every programmer should read the source to abort() at some point in their life.

msarnoff · on July 19, 2011

Reminded me of "5 ways to reboot a PC, none of them reliable" from a couple months ago.

http://news.ycombinator.com/item?id=2607116

Interesting how fundamentally simple tasks like aborting a process or rebooting the machine have very nontrivial (even kludgey) implementations.

kenjackson · on July 19, 2011

Raymond Chen wrote a classic blog post on how process exits on WinXP. But if you're a developer, know that it's not the same for Win7.

http://blogs.msdn.com/b/oldnewthing/archive/2007/05/03/23833...

yuhong · on July 20, 2011

Also see this: http://blogs.msdn.com/b/oldnewthing/archive/2007/05/04/24020...

cpeterso · on July 19, 2011

For comparison, here's FreeBSD's abort():

http://svnweb.freebsd.org/base/head/lib/libc/stdlib/abort.c?...

And GNU glibc's abort():

http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=stdlib...

masklinn · on July 19, 2011

And OSX's, which is rather similar to FreeBSD's but adds:

* Writing to NULL

* Writing to address 1 (unaligned write)

* Writing to text space (read-only machine code)

* Dividing by 0

* More violence than SIGABRT (SIGILL, SIGBUS)

http://www.opensource.apple.com/source/Libc/Libc-262/stdlib/...

ajross · on July 19, 2011

Looks to me like glibc and FreeBSD are the only ones that flush stdout, which I'd view as a bug on the other systems...

sparky · on July 20, 2011

(Note: The uclibc implementation in the OP also attempts to flush stdout as part of the _stdio_term() call)

It's a bug in the spec, if anything. See the abort() spec http://pubs.opengroup.org/onlinepubs/009695399/functions/abo...

Vs the exit() spec http://pubs.opengroup.org/onlinepubs/000095399/functions/exi...

abort() is intended as a last-ditch effort. exit() is the one that attempts to flush all open buffered file descriptors, and should be used in lieu of abort except in cases where you know you're screwed, or explicitly want to throw a signal so a debugger can take a peek.

Jabbles · on July 20, 2011

Since dereferencing a null pointer and dividing by 0 are undefined by C, is the compiler required to emit the code for them? In practice, does it?

praptak · on July 20, 2011

In practice, <asterisk>(int <asterisk>)0 (how do I escape asterisks on HN?) and similar are popular idioms for "segfault here". Making them break is not an optimization any compiler maintainer would bother to make - it requires a special case and there don't seem to be any benefits to justify the effort.

jesboat · on July 21, 2011

See the series of articles starting with http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

the_mitsuhiko · on July 20, 2011

In practice: compilers usually don't touch expressions with volatile vars.

p9idf · on July 19, 2011

Plan 9's abort causes an access fault, causing the current process to enter the `Broken' state. The process can then be inspected by a debugger. Pretty elegant.

http://plan9.bell-labs.com/sources/plan9/sys/src/libc/9sys/a...

1amzave · on July 19, 2011

In practical terms, does this differ significantly from killing the process and leaving a coredump?

cpeterso · on July 19, 2011

Beautiful! The gurus do not disappoint.

jethroalias97 · on July 19, 2011

Very elegant, though in some cases 0x0 is addressable, in which case, the abort never happens.

gchpaco · on July 20, 2011

Not, as it happens, on plan 9 systems where the kernel marks that area as not accessible. This could fail, conceivably, but then you're in much deeper stuff.

raldi · on July 20, 2011

0 cast as a pointer is not necessarily 0x0.

The spec says it equals NULL, which, too, is not necessarily 0x0.

ristretto · on July 19, 2011

again, what's with the while()? even if it doesnt cause a segmentation fault, there's a chance it will evaluate to 0.

singular · on July 20, 2011

gcc, at least, optimises out the null deref for both:-

    *(int *)0;

and:-

    for(;;)
        *(int *)0;

So the first bit of code does nothing, and the second slips off into an infinite loop.

I suspect that treating expressions that demonstrably lack side effects (other than the intended segfault here of course) as statements is undefined, and hence these are getting optimised out (even with -O0).

Clearly with:-

    while(*(int *)0)

The expression is being evaluated and is therefore not elided, I guess the choice of while is to 'be cute' as others have suggested, and I guess the world is sane in plan 9 and 0 is readable so you can't get a situation where it escapes the loop. Perhaps there is a deeper reason here that I am missing, however.

raldi · on July 20, 2011

(int *)0 is not defined as a pointer to memory address 0x0 on architectures that support such an address.

0 cast as a pointer is defined by the spec to always be the NULL pointer, which on such architectures would have a value other than 0x0 and not point anywhere addressable.

ristretto · on July 20, 2011

Thanks. I could never guess it could be optimized out.

rm445 · on July 19, 2011

I'm not a plan 9 programmer, but to me it looks 'cute' (in the sense of attractive to some people but annoying to others) - that form of abort() would only be used on systems where that operation is known to abort the process, but enclosing it in a while simply makes it apparent that there is no alternative to trying it.

On a more prosaic note, perhaps

    for(;;)
        *(int *)0;

generates a compiler warning that the programmer wanted to avoid.

p9idf · on July 19, 2011

As far as I know, the kernel programs the MMU so that dereferencing 0 will always fault. I could be wrong, as my understanding of the kernel is limited. I am not sure of the purpose of the loop, but to me it make it unavoidably obvious that the function never returns.

thecoffman · on July 19, 2011

Great article, picking apart low level code like this can be super informative - and you've explained it well.

chrisledet · on July 19, 2011

Link to abort.c source: https://gist.github.com/1093410

IgorPartola · on July 19, 2011

I may be wrong, but I thought that in a multithreaded environment, doing i++ is not atomic and could result in garbled data. Instead you should use __sync_add_and_fetch. However, I have no idea if it should be used inside abort().

cube13 · on July 20, 2011

>I may be wrong, but I thought that in a multithreaded environment, doing i++ is not atomic and could result in garbled data. Instead you should use __sync_add_and_fetch. However, I have no idea if it should be used inside abort().

This is only true if the variable in question's memory is accessed by multiple threads at the same time, and there isn't any locking or synchronization method used to protect the memory.

In this case, even though it is a globally scoped variable, it's locked by the globally scoped mutex declared in the file. All increments are done in the locked sections, so there isn't any possibility of accessing the variable without having a lock.

It should be noted that there is a very minor race condition when abort() is called in two different threads sequentially, and every attempt up to line 89 doesn't work. The first call will get the lock, then go through to line 89, where it released the lock. The second thread will then get the lock, and go through the first section. When it hits the section line 89(if (been_there_done_that == 0)), that will resolve to false, because been_there_done_that is 1. It will then go on, leaving the first thread deadlocked at the LOCK attempt on line 91. This shouldn't result in any missed functionality, but I actually wonder why they're releasing the lock in the first place. Raise() isn't thread safe anyway, because the signal is applied to all threads in the process. Plus, you're trying to suicide the program. It's a bad idea to even have the possibility of multiple threads trying to kill themselves at the same time.

raldi · on July 20, 2011

The lock must be released because the same thread might reenter abort() in the signal handler, and without a release in the parent abort(), the program would hang.

IgorPartola · on July 20, 2011

You are right. Re-reading the code, I see that it re-acqures the lock.

cdcarter · on July 19, 2011

Since the lock is not guaranteed in the code, the variable is globally defined and the code only ever increases it. This means a step in the chain of killing could get skipped, but that doesn't matter, as there's always a more violent option (or just the infinite loop).

kabdib · on July 20, 2011

On an 8-bit system with 16-bit-wide ints, i++ is almost certainly not atomic (with respect to threads, interrupts, etc.).

A few months ago I fixed this exact bug in a developer's code, on a 8-bit embedded processor.

leif · on July 19, 2011

i++ is atomic in most cases, unless i is an excessively wide integer, in abort I think they lock anyway so it doesn't matter (at least in uclibc)

jbri · on July 20, 2011

Actually, while the write itself is generally atomic, most systems don't guarantee that the entire increment operation is (for performance reasons).

And since abort still needs to work even without locks and on every platform...

snorkel · on July 19, 2011

Wow, abort() is much more polite than I assumed.

known · on July 20, 2011

Why only abort()? Entire Kernal code is beautiful http://lxr.linux.no/

KonradKlause · on July 19, 2011

glibc's abort() is even more complex:

http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort...

jordanb · on July 19, 2011

Most of the code in glibc has been heavily Dreppered.

ars · on July 19, 2011

What does Dreppered mean?

jordanb · on July 19, 2011

Ulrich Drepper tends to write software that is easy for him to read (one would hope) -- but is very difficult for anyone else to follow.

jrockway · on July 19, 2011

But you have to admit, his abort is pretty damn easy to read.

jordanb · on July 19, 2011

I admit it's heavily commented, and his usual tangle of preprocessor macros are blissfully absent.

But it contains hints of Drepperification, like the superfluous use of preincrement.

kelnos · on July 20, 2011

When I don't care about the result, I always write preinc/decrement too. Sure, it's superfluous on any non-braindead compiler (it should be able to see that you don't care about the result of a postincrement and elide the temporary), but it's just habit at this point. I fail to see how it reduces or changes readability though.

Sounds like you just have an axe to grind with Drepper.

jordanb · on July 20, 2011

I don't have anything personal against Drepper. I've never had any direct experience with him of any kind.

I thoroughly enjoyed his article about memory. He is obviously an extremely intelligent and knowledgeable guy.

I am afraid that he is too clever by half though, insofar as good code is clean and readable first, and clever second. Every time I've had an opportunity to interact with the glibc codebase I'm dismayed that such an important, core piece of software has been written so cleverly that it essentially can only be maintained by one guy.

kelnos · on July 20, 2011

That's unfortunate; I'll have to look at some glibc code sometime. Could be fun... or, as you point out, dismaying :-/

drv · on July 19, 2011

Ulrich Drepper is the maintainer of glibc; I don't know what the comment is intended to mean, though.

pluies · on July 19, 2011

It's probably a quip referring to Ulrich Drepper, a kernel hacker whose personality seems to be quite controversial according to a quick Google search. I'd love to hear the GP explain it further though.

VMG · on July 20, 2011

Is he a kernel hacker? I thought he was only a glibc hacker.

KonradKlause · on July 19, 2011

very true. but glibc's abort() is IMHO ok.

lubutu · on July 19, 2011

musl's abort() is quite elegant: http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=blob;f=sr...

kelnos · on July 20, 2011

And also not very robust. Sure, it'll halt execution -- of that one thread, anyway -- but if the program has a SIGABRT handler installed that doesn't exit, abort() will fail to do its job. It'd be nice to try just a little harder to kill the program.

sagarun · on July 20, 2011

I liked the way they commented there.Reading the code with comments was like reading an comic book "Still here? We are screwed!" :-)

ristretto · on July 19, 2011

Why is the outer while(1) loop needed? Seems redundant to me.

raldi · on July 19, 2011

The link specifically asks that question -- there are some good guesses in the comments.

ristretto · on July 19, 2011

the one in line 87. It looks like it might be added later because it's not even indented, but clearly it doesn't serve any purpose and it doesn't make it more readable (you can tell the function never returns from the second while(1) loop).

re · on July 19, 2011

> it's not even indented

It's an issue of mixed tab-space indentation, with tabs being displayed at 4 spaces instead of the 8 spaces they were intended to be.

Ideka · on July 19, 2011

You know... just in case.

cheez · on July 19, 2011

Every programmer should implement an operating system and compiler. I did. Now get off my lawn.

KonradKlause · on July 19, 2011