Hacker News new | past | comments | ask | show | jobs | submit login

I remember reading this (back in 2006?) and looking at the versions of Binary Search and Merge Sort that I had recently written for the Data Structures & Algorithms class that I teach. I found that my versions did not have this bug, despite the fact that I had not given any thought to it.

The reason my code did not have the bug is that I represented the range to be processed, not as low & high subscripts, but rather using a pair of iterators (this was in C++). And you cannot add two iterators. So instead of mid = (low + high)/2, you get something like size = high - low and then mid = low + size/2, where size is an integer, and low, high, mid are iterators.

There is a lesson to be learned here, certainly. I am not exactly sure what it is in its full generality, but I think we can conclude that the endpoints of a range, regardless of how they are represented, are not things that should be added together.




Yes, I've usually represented the range with (low, size) instead of (low, high) (and then the high-low part never comes up).

So when this article first came out I was all "Aren't you bagging on Bentley too much? He published pseudocode, and of course you have to take care about things like overflow when converting it to C." But then I saw Bentley saying somewhere that no, he really hadn't considered this problem. (IIRC)


Bentley didn't just publish pseudo-code; he also published C, and the published C contains the bug.


Ah well, at least it fit the theme here when I misremembered.


regardless of how they are represented, are not things that should be added together.

Maybe something about torsors?

http://math.ucr.edu/home/baez/torsors.html


Apparently.

(Nice article. Thanks for the link.)


How does that avoid the problem? Assuming you used a signed integer for size and the list is large enough, there is a sign roll over on size. If size becomes less than -2 then low + size/2 is less than low (on the first pass of a binary search you now have an invalid iterator for mid). Assuming you used unsigned integers size can still not be the correct value if a rollover has occurred though this time mid will always be valid.


If you have a byte array on a 32bit machine, assuming your program's text (i.e. instructions) takes up some space, the biggest possible array will be less than 2^32 Bytes = 4GiB, so the midpoint will be too. There can be no overflow.

If you subtracted two 64bit pointers and stored the result in an int32 though, then you would have problems.


> Assuming you used a signed integer for size ....

By "integer", I did not mean "int". size was a std::size_t.


Perhaps even more generally, the range ADT should include a method to locate the midpoint of a range, in addition to the endpoints.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: