Hacker News new | past | comments | ask | show | jobs | submit login

It's not binary searches and merge sorts that are broken, it's the INT data type that is "broken" (and I put "broken" in scare quotes because they aren't really broken, they just don't do what you want in most cases). People tacitly assume that INTs model the integers, but they don't. They model the integers modulo 2^N for some value of N. If you code as if INTs were integers and you hit the 2^N limit you will lose.

The algorithm as given would work perfectly well on a straightforward translation into a language with a real integer data type like Python or Lisp.

And you can lose in the other direction too: if you're doing crypto, for example, the algorithms are often designed to use arithmetic operations modulo 2^N, so INTs (and NOT integers) can be just what you need.

But this has nothing do do with binary search or mergesort.




I can make a similar argument about floating-point numbers not modelling reals, and the hysteria over the lack of reflexivity of equality in IEEE 754 is a little overblown.


Why trifle over tiny little epsilons, or NaN, which isn't even a number? ;-)


Don't mean to give Python too much attention here, but I discovered the decimal module a couple months ago and it's fantastic. :)


While it's not a big deal: A common implementation strategy in language run-times is to use machine words when the integer value fits within a particular range. This may mean that the shown implementation will incur a performance penalty well before it's actually necessary to pay the price for using big ints.

But this has nothing do do with binary search or mergesort.

Regardless of our opinions of overflow, it exists in many languages. And I cannot think of anything that would better qualify this issue for "has something to do with binary search" status than that most programmers have seen, used, or written an implementation with this bug.


If Perlis was right when he said that Lispers know the value of everything and the cost of nothing, this bug proves the counterpoint: of Fortran-derived languages knowing the cost of everything but the value of, um, most things but not some others.

Not quite as eloquent, is it?


A more correct title is "most binary search implementations have a bug" but that isn't catchy enough.

When I first saw an article about it, it was actually in the context of how hard it is to write correct code. That should be the take away.


That's my reading, too. The author is really making a point about the fundamental challenge of programming purely logical models in the physical world.

It's a little disappointing to me that the HN community is more interested in the concrete example than the meta topic itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: