Hacker News new | past | comments | ask | show | jobs | submit login
Nearly all binary search and merge sort implementations are broken (2006) (googleresearch.blogspot.com)
148 points by arto on Jan 30, 2012 | hide | past | favorite | 69 comments



It is not sufficient merely to prove a program correct; you have to test it too. Moreover, to be really certain that a program is correct, you have to test it for all possible input values, but this is seldom feasible.

This statement is tantamount to saying "you don't merely need to prove Fermat's Last Theorem, you also have to test it for all possible input values". By this line of reasoning, most of mathematics should be thrown out.

If you've proven a program correct, it is correct for all input values, whether you have tested them or not. If your proof ignores complexities such as the range of data values, it isn't a correct proof to begin with. And yes, proving programs correct is often possible -- in fact, the highest levels of software verification actually require explicit proofs of correctness for much of the code; see http://en.wikipedia.org/wiki/Evaluation_Assurance_Level#EAL7...


Mathematics sits atop a very small set of trusted axioms. Computer hardware is not so reliable; the parts are made in China as cheaply as possible. So even if you've proven something correct in theory, it doesn't matter because of all the additional variables that the real world introduces.

The big insights in computer science look very much like mathematics; the mechanics of a binary search work perfectly in theory. But in practice, it's easy to make a simple mistake in implementation and accidentally throw your proof out the window. The proof remains valid, but your implementation isn't doing what the proof says.

The fact that a famous book on proving correct and then implementing a binary search ended up being slightly wrong underscores how easy it is to make this mistake. (I find binary search a little annoying because of the integer division. Do you round up or round down? What does your language implementation do? Now you see why you need to mentally prove that you've written the right algorithm, and then test to make sure your computer is doing what you think you're telling it to.)


Actually most bugs are being introduced in the translation between pseudocode and your working programming language. For instance pseudocode does not deal with 32-bit integers that can overflow.

But that's not the fault of the algorithm, or the fault of its proof and it also has nothing to do with the origin of your computer parts.

Also, the "mechanics" of binary search work perfectly in practice. I see no evidence to the contrary, either in your comment or in the above article and that same implementation described works perfectly in Python.


Upvoted for the first two paragraphs, but I can't make sense of the second sentence of the third paragraph. The algorithm given in the article works in Python because Python has different overflow behavior than Java, C, or C++. When you say you "see no evidence", it seems like you are saying the implementation given in the article does work in Java etc.


Technically, the naïve implementation is perfectly fine for C/C++, so long as size_t is at least 268 bits long on your platform. :)


> But in practice, it's easy to make a simple mistake in implementation and accidentally throw your proof out the window.

But that's not how a proof of correctness works. You have an implementation (of an algorithm, say) and you prove mathematically that it is correct with respect to a formal specification. The implementation and proof aren't separate things.


It seems like you can only prove something in computer software up to the point of some random failure in hardware. If there's a chance that a cosmic ray will randomly flip a bit in memory every quintillion clock cycles, then surely you can only prove that an algorithm is correct with the explicit concession that it might be wrong every decade or so.


"Beware of bugs in the above code; I have only proved it correct, not tried it."

   — Donald Knuth


> This statement is tantamount to saying "you don't merely need to prove Fermat's Last Theorem, you also have to test it for all possible input values". By this line of reasoning, most of mathematics should be thrown out.

> If you've proven a program correct ..

You are making a categorical error: One proves algorithms and tests programs. A program is a representation of an algorithm. If the language of implementation e.g. Haskell is purely functional, and your representation e.g. data types is faithful to the algorithm, then you can make claims about your provably correct implementation.


You are making a categorical error: One proves algorithms and tests programs.

You can prove programs too, not just algorithms. That's what the entire (exorbitantly expensive) field of verified applications lies upon. Even my college class on formal logic covered the basics of this, and there's billions of dollars worth of extremely high-reliability applications that have been developed using such formal proofs.

For example, Green Hills Software's INTEGRITY operating system has bounds on the runtime and behavior of system calls formally verified, so one can reasonably guarantee that a malicious user cannot DDOS the system by making system calls with unbounded duration.

There are even entire programming languages designed for the sole purpose of making formal, mathematical verification of programs easier.

A program is a representation of an algorithm.

Yes, which is why you have to prove it correct again, since real programming languages, e.g. C, have limitations not typically present in the original algorithm.


The programs that can be proven are severely limited in power. For one, the tools to prove correctness of programs can't deal with data other than integers[1].

The way this usually works is that the proof is written and the code is then generated from the proof. That code is then incorporated in something larger, which has not been proven correct. Such is the case with algorithms. You can mathematically prove an algorithm correct. You can even write a proof and generate a program from it. Then you still have to use that algorithm and all the code using it can't be proven correct with current methods. That is why you will still have to test your program.

  There are even entire programming languages designed for
  the sole purpose of making formal, mathematical
  verification of programs easier.
Yes, and their practical usefulness is still severely limited.

[1] Of course, in theory this means they can deal with any data; in practice it means they can't.


For one, the tools to prove correctness of programs can't deal with data other than integers.

This is the first time I hear such claim. Which tools are you talking about? The tools I know (Coq) certainly don't have such limitations -- why would they, anyway?


For performance reasons. The SPIN model checker, which is the state of the art in terms of performance, employs only integers.

Has Coq ever been used to solve an actual practical industrial or business problem, either by generating code from a known correct program or by using the Coq model as an oracle to drive automatic testing? If not, then there's your answer: academically useful proof assistants cannot deal with real world data. Proof assistants useful in the real world can only deal with limited data types to be able to achieve their goal.


you can prove theoretical programs. do you really want to have to prove that it still works with any random bit flipped at runtime? what about two random flipped bits?


All proofs start from some assumptions. When you are proving programs, you assume that the hardware is functioning properly. After all, anyone going to the expense of proving their software is correct will also pay for ECC throughout their hardware. Once you've proven your software correct, you can make assurances about the behavior of the software when run on particular hardware, and those assurances will be limited primarily by the reliability of the hardware.


Assuming that hardware will work correctly can be an extremely dangerous assumption.


You're challenging what it means for something to be a proof, which I think is always subject to philosophical debate. Intuitively, something is a proof of a proposition if it convinces someone else that the proposition is true, and the proposition actually is true.

Obviously, for difficult concepts, it's possible that every expert in the world could be convinced that a proposed proof is correct, only to later find out that it's not. The way I see it, you're always only proving that something has a high probability of being correct, and showing how high the probability is. For computer software, you might be showing that your program is correct with the same probability that the underlying hardware is working as specified (so cosmic rays shifting bits or faulty hardware are possible exceptions baked into your proof).

To get a bit silly, a mathematical proof delivered from person A to person B is really only showing that a proposition is true with the same probability that person B is sane, educated, and is understanding the concepts correctly,


What is your point, exactly?

Do you think testing (likely on a different machine than the target) will catch a 1 in a million hardware error either?


You can prove the correctness of a program given a specification of the language you write it in. But is the compiler you use to compile your program correct? Do you know all processor bugs?


I have heard from someone involved with project management of military aircraft software that the process includes verification of machine code produced by the compiler, that is: you have to prove that the code generated is exactly what you intended to get.

http://www.sandroid.org/birdsproject/4dummies.html gives a glimpse of how much fun developing software for aviation purposes is.


Computer hardware is not correct. So, you need to step past mathematical perfection and write defensively.

Rule 1: you can't trust RAM it lies.

That said you can build automated tests that add random memory errors to help find the program that's best able to handle memory errors.


I have never seen a piece of software that is built with memory error tolerance is tested against random memory errors. Have you?

Maybe a spacecraft might have some. Error correcting memory, and the software associated with it is, of course, an exception. Is there code dealing with memory errors somewhere in the kernel (of any OS)?

For most programming, it would simply not make sense to try to take memory errors into account. It's very unlikely that in the lifetime of the program, there will ever be a situation where a bit will be flipped and it will have an actual effect.


They are a lot more common than you might think, basicly every day every PC has a memory error. If one of the textures in a video game gets corrupted you may not notice it, and if you have real issues simply rebooting tends to work well. Recent tests give widely varying error rates with over 7 orders of magnitude difference, ranging from 10^−10 to 10^17 error/bit·h, roughly one bit error, per hour, per gigabyte of memory to one bit error, per century, per gigabyte of memory.[7][8][9] http://en.wikipedia.org/wiki/Dynamic_random-access_memory The biggest defence for a home user is simply the banial nature of most information stored in memory.

Yet, you don't have to be building a spacecraft for rebooting to be an issue. One example that comes to mind is remote sensors. A friend was working on power meter which reported back it's findings every few days. The initial version used normal programming practices and after a six months field test of a few thousand units the projected failure rate over 10 years in the field was unacceptably high. He said the code was simple and 'correct' but failed to deal with with corruption. His version had less than 1% of the original failure rate in the field and is projected to save the company far more than his 6 month contract to rewrite the thing from scratch.

A more dramatic example is your car's internal engine controls. But, there they simply reboot regularly as there is no need for maintaining state over the long term.


You too might be intressted in the link I posted above. Its about building safe hardware (and software), but I don't think its completly safe to anykind of random bit-flips.


What you are talking about is the "trusted computing base". Prove every part of the basesystem your running on, this of course is a HUGE task, but there are people thinking and working on this.

You might injoy this darpa project: http://www.crash-safe.org/papers

The propose there own hardware, OS, languages and compilers. The Hardware Design is by Tom Knight, one of the guys who designed the lisp machine.

Its quite an intressting read.


How do you prove the proof is correct? And how do you prove that your mathematical representation correctly corresponds to the problem you are modeling?

This is why I am bitter and disillusioned about mathematics: it promises certainty, but doesn't deliver. It remains pragmatically useful, like many other tools, but does not deserve the semi-mystical status some confer on it. </rant>


The proof is a sequence of extremely easy logical inference rules like if A=>B and A then B. You check the correctness of the proof by verifying that these rules are applied correctly. Obviously though, there is no way to prove that your mathematical model corresponds to reality. That is because mathematics is not concerned with reality.


What if you make a mistake in checking the proof? Sure, that may sound silly, but mistakes are the point at issue here.

The usual rejoinder is that you can use an automated proof checker/assistant like COQ. Now you're relying on a program. Is the program correct? Well, we checked it with itself...

I admit this is far better than not having a proof at all. As I said, it's a tool, and has pragmatic merit. My objection is it's not absolute proof - which is what "proof" sounds like to me. In reality, a proof is an argument for the truth of a claim, with a level of convincingness.

BTW: Admittedly, reducing a proof to simple rules makes it harder to get wrong, though this is rarely done by mathematicians. Also, it's a curious fact that some mathematicians have made mistakes in their proofs, but turned out to be right anyway, presumably because they could see that it was true, and the notation was secondary.


Proving theorems in mathematics plays a bit different role than proving correctness of programs. In mathematics, it's about sharing ideas, so if the proof is not completely correct, it can still be interesting because of the ideas and methods it uses. When you prove correctness of programs though, the whole point is to be sure that the program is correct, so the actual proof is important here, not its method or structure, because it's usually clear why the program at hand works.


If you make a mistake in checking the proof, the the proof is incorrect. The problem is with your application of the mathematics, not the applicability of mathematics itself to the problem of showing the truth of things.


It depends on the scope of the proof. It's sloppy to say "this is proved to be correct". They should say "this is proved to be terminate with output X in a finite time, for input values in the range -Y:Y".

Everything can be proved to be "correct", for some definition of "correct" (i.e. "output undefined, program may not terminate").


Previous, extensive discussion: http://news.ycombinator.com/item?id=1130463

I once ran a programming challenge based on this, but I discontinued when I started to get threats of physical violence from people whose code didn't pass the tests.

The Dictionary of Algorithms and Data Structures on the NIST site has a correction I submitted on exactly this point:

http://xlinux.nist.gov/dads/HTML/binarySearch.html

And my blog post: http://www.solipsys.co.uk/new/BinarySearchReconsidered.html?...


Nice to see that the Go gets it right:

        func Search(n int, f func(int) bool) int {
                // Define f(-1) == false and f(n) == true.
                // Invariant: f(i-1) == false, f(j) == true.
                i, j := 0, n
                for i < j {
                        h := i + (j-i)/2 // avoid overflow when computing h
                        // i ≤ h < j
                        if !f(h) {
                                i = h + 1 // preserves f(i-1) == false
                        } else {
                                j = h // preserves f(j) == true
                        }
                }
                // i == j, f(i-1) == false, and f(j) (= f(i)) == true  =>  answer is i.
                return i
        }
http://code.google.com/p/go/source/browse/src/pkg/sort/searc...


I expect at least one of the Go authors was well aware of this very incident (the Java breakage), it was fairly well publicized before Go ever saw the light of day.


An interesting article that links back to this one:

http://reprog.wordpress.com/2010/04/19/are-you-one-of-the-10...

The claim is that even ignoring overflow, only 10% of programmers can correctly implement a binary search. When I tried it, I thought I got it working, but it was later pointed out that I didn't handle empty lists correctly.

Programming correctly is hard.


I hope that means "10% get it right first time" and not "only 10% can do it at all, even given a computer and a whole day to do it."

Couldn't resist that challenge (although the thread is nearly 2 years old.) I think mine works, at least it gives the right answer for all my test cases.


It's even hard to come up with all the relevant test cases - if you miss a corner case in the implementation, it is likely you will oversee it also in the test case generation. Only an exhaustive search would somehow be able to discover this.

With e.g. templates in C++, this is possible in some cases. You can test a template version using the limited range of an unsigned char, while your real implementation will use a uint64.

But instead of implementing e.g. a binary search, I prefer to take a proven implementation from e.g. the STL and adapt it for my needs. Experience and programmer lazyness have taught me this is a good way :)


While true, it seems that if your array is that close to the inherent size limit in indexing by ints, this is at most a temporary fix. How fast does your sorting requirements go from 2^31 to 2^32? It seems that you should be using size_t here, not int.


Yes, the binary search I wrote back in the day used size_t and represented the range as (start, length) rather than (start, end). Full range, and no overflow anywhere if you write it the obvious way. Maybe it was a little bit slower, I don't know.

I'd used Bentley as the starting point, so I was surprised when this news first came out and got reported as Bentley's bug. I guess it could be considered so.


size_t is actually 32-bit on some 64-bit platforms; you want uint64_t.


If that's true, that's a bug in the platform, not the program. The whole point of size_t is to be however many bits you need to represent the size of things in your address space.


Bypass the borked formatting (original is JS dependent; link below is to Google text-only cache):

http://webcache.googleusercontent.com/search?strip=1&q=c...

IIRC, this made the rounds a few years ago, in case it sounds familiar.


Thanks. The site is so broken on the ipad, asks if you want a static view which goes to the front page, or an "unsupported" dynamic view which actually turns out to work and is the same page. Google this is so broken! Just make html that works in a browser!


I'm tempted to solve it with:

  low/2 + high/2 + (low & high & 1);
That side steps the overflow condition completely.


It would sidestep the overflow, but prima facie it appears to be slower than the provided:

    low + (high - low)/2;
Your solution takes, depending on the architecture, between several more and nearly double the trips to the ALU.

Gimmie a sec; I've got bugger all to do at work. I'll compile it and profile it.


Would be interested in seeing the profile. Re-ordering it to the following (which the compiler might do too)

  low&high&1 + low>>1 + high>>1
Should compute in a single cycle if the register coloring is working in the pipeline. The <reg> >> 1 come out of the barrel shifter stage, the low&high&1 resolves in the load, so you end up with a single sum of three operands. Since its being stored in a separate register that would avoid a write stall in the pipeline as well.


I'm tempted to suggest that the fixed C/C++ version is still not correct with respect to this issue, namely on architectures where INT_MAX == UINT_MAX. (Or really INT_MAX <= UINT_MAX < 2 * INT_MAX, which is valid in C99 for INT_MAX >= 65535.)


INT_MAX == UINT_MAX is the only valid failing case, because elsewhere in the standard integer types are restricted to pure binary representations. This means that if UINT_MAX is greater than INT_MAX, it must be at least INT_MAX * 2 + 1.


That can't actually be taken for granted, relevant Usenet discussion begins here:

http://groups.google.com/group/comp.std.c/browse_thread/thre...


I concur with Dan Pop's interpretation.


I don't like the index variables and splitting in half part. I like to write my binary searches with bit patterns:

1) figure out the number of bits required to cover the largest index (= a little bit of cheap bit-twiddling)

3) initialize your index to 0

2) flip a bit on in index, starting from the highest bit available from step 1)

3) if data[index] is smaller than what you're looking for, leave the bit on

4) try with the next-lowest bit and loop to 2) until at bit #0

It's pretty easy to write it out in a way that's just obvious, instead of requiring the reader to wrap his mind about which variable is the lowest and highest bound and whether the indexes are inclusive/exclusive in which ends, etc.


You mean this one?

    int binary_search_pow2(const int a[], int len, int key) {
        int i = 0, step;
        for (step = len / 2; step > 0; step >>= 1)
            if (a[i | step] <= key)
                i |= step;
        return i;
    }
I thought that only worked for power-of-two sized arrays.


I implemented one for practise, without assuming powers of two:

    int binarySearch(const int needle, const int haystack[], const int size) {
    	if (size==0)
    		return -1;
    
    	short log2 = 0;	// rounded down
    	for (int sz=size; sz>>=1; log2++);
    
    	int index = 0;
    	for (int flag=1<<log2; flag>0; flag>>=1) {
    		index |= flag;	// set
    		if (index>=size || needle<haystack[index])
    			index ^= flag;	// unset
    	}
    	return needle==haystack[index] ? index : -index-1;
    }


That's really cool - would you care to post some example code so we can see the clarity obtained by this novel approach?

Did you come up with it yourself, or did you see it somewhere else?


I think one of Jon Bentley's "Programming pearls" books contains a version of binary search that works more or less that way and shows how to get there from a naive and more obviously correct implementation by successive small transformations. IIRC it ends up having a particularly tight inner loop. And it doesn't, I think, require a power-of-2-sized array. My copy of the book is at home and I'm at work so I can't check any of my recollections right now.


You're right - Programming Pearls, column 8 "Code Tuning." It's not quite like this, and it's in some variant of BASIC, but when you squint, it's there.

Thanks.


You're welcome. In the second edition, which is what I happen to have, it's column 9 (because the old column 4 got split into two) and the code is in a pseudocode that's much more like C with the semicolons deleted than like any version of BASIC.

I wasn't quite right to say that the inner loop is very tight; rather, the code assumes a particular size of array and unrolls the loop completely. But, indeed, the size doesn't need to be a power of 2.

Here's the (pseudo)code, in case anyone cares. It's for an array of size 1000. I've changed a variable name from "l" to "m" because of the usual l1I| thing. I've removed a couple of helpful comments because anyone who cares enough to bother reading them will probably have more fun figuring everything out on their own. I've also elided some obvious repetitive code and formatted it slightly differently from Bentley. Any errors in the code were probably introduced by me.

  m = -1
  if (x[  511] < t) m = 1000-512
  if (x[m+256] < t) m += 256
  if (x[m+128] < t) m += 128
  /* ... */
  if (x[m+  2] < t) m +=   2
  if (x[m+  1] < t) m +=   1
  p = m+1
  if (p>1000 || x[p]!=t) p = -1 /* i.e., not found */
Bentley says this is "not for the faint of heart". I agree, but I do think it's lovely. Though personally I'd be inclined to use a value of m offset by 1 from what Bentley does (initialize to 0, look up m+255, m+127, etc.) and save a line of code and a couple of cycles.


This bug requires either (1) low/mid/high are pointers [edit - I don't think this is valid; IIRC pointer arithmetic is only sane for add/subtract an int from a pointer, not divide a pointer or add two pointers]; or (2) item_count > INT_MAX / 2, which means memory_size >= sizeof(your_array) == sizeof(item) * item_count > INT_MAX / 2, which means either 1-byte or 2-byte array elements, or that your ints are smaller than your pointers. So is it really "nearly all"?


The author is claiming that nearly all binary search implementations are wrong under those specific conditions, not that binary search implementations are wrong under nearly all conditions.


...bah. I keep forgetting that 64-bit C/C++ compilers are generally LP64 (or even LLP64) instead of ILP64, so "int" being possibly too small is in fact typical.

Still. In the example (Java) implicit down-converting isn't allowed, so this is a result of the spec putting arbitrary limits on array size (must be indexed by nonnegative 'int' values). In C or C++ there is no such limit and I think more modern compilers (llvm) will give a warning on loss of precision, so either you have that warning ignored/disabled/unavailable or you have 32-bit code with a billion-element array. I guess what I'm thinking with this is, with a sane language/compiler, the only way to trigger this should be to fill half of your theoretically-logically-addressable memory with an array with single-byte elements.


You can also substract two pointers (this is defined only if the two pointers are within a common memory area). The result is a ptrdiff_t which is more or less the scalar version of a void*.


I don't understand why I need Javscript enabled to merely read a blog post on blogspots.


To be fair, there's a classic version (but the link to it discards the article id, for some unfathomable reason).

This works: http://googleresearch.blogspot.com/2006/06/extra-extra-read-...


The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.

It surely does if everyone has to write their own binary search implementation when most standard libraries have one and several of them have one that is flexible enough (either due to the language's features or the binary search routine itself) to not just search through a container but through a solution space in an optimization problem. So I'm surprised the author hasn't concluded with the lesson that would be somewhat more practical: It is hard to write even the smallest piece of code correctly, so don't do it and use whatever's in your stack unless you have a good reason not to.


Why are overflows silent?

Shouldn't use cases where you want the overflow behavior be more explicit?


So C, C++, and Java don't have good support for arithmetic. What else is new?


+1, this has nothing to do with binary search and everything to do with the fact that almost any application written in those languages will have bugs that do not show until numbers nearing INT_MAX are used. Another given is that all applications that use recursion is bound to run into stack overflow errors/bugs when used on big data.



an old story but still well worth the re-reading




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: