The following C++ program can show you how an overflow can happen with a 32-bit unsigned integer:
#include <iostream>
using namespace std;
int main ()
{
unsigned int low = 33,
high = 4294967290,
mid;
cout << "The value of low is " << low << endl;
cout << "The value of high is " << high << endl;
mid = (low + high) / 2;
cout << "The value of mid is " << mid << endl;
return 0;
}
If you run it on a Mac:
$ g++ try.cpp && ./a.out
The value of low is 33
The value of high is 4294967290
The value of mid is 13
The value of mid
might be expected to be 2147483661
, but low + high
overflowed because a 32-bit unsigned integer cannot contain the proper value, and give back 27
, and so mid
becomes 13
.
When the calculation of mid
is changed to
mid = low + (high - low) / 2;
Then it will show
The value of mid is 2147483661
The simple answer is, the addition l + u
can overflow, and has undefined behavior in some languages, as described in a blog post by Joshua Bloch, about a bug in the Java library for the implementation of binary search.
Some readers may not understand what it is about:
l + (u - l) / 2
Note that in some code, the variable names are different, and it is
low + (high - low) / 2
The answer is: let's say if you have two numbers: 200 and 210, and now you want the "middle number". And let's say if you add any two numbers and the result is greater than 255, then it can overflow and the behavior is undefined, then what can you do? A simple way is just to add the difference between them, but just half of it, to the smaller value: look at what the difference is between 200 and 210. It is 10. (You can consider it the "difference" or "length", between them). So you just need to add 10 / 2 = 5
to 200, and get 205. You don't need to add 200 and 210 together first -- and that's how we can reach the calculation: (u - l)
is the difference. (u - l) / 2
is half of it. Add that to l
and we have l + (u - l) / 2
.
It is like, if we are looking at two trees, one is 200 feet tall and one is 210 feet tall, what is the "midpoint" or the "mean"? We don't have to add them together first. We can just tell the difference is 10 feet, and we can add half of that, which is 5, to 200, and we know it is 205 feet.
To put this into history perspectives, Robert Sedgewick mentioned that the first binary search was stated in 1946, and it wasn't correct until 1964. Jon Bentley described in his book Programming Pearls in 1988 that more that 90% of the professional programmers could not write it correctly given a couple of hours. But even Jon Bentley himself had that overflow bug for 20 years. A study that was published in 1988 showed that accurate code for binary search was only found in 5 out of 20 textbooks. In 2006, Joshua Bloch wrote that blog post about the bug about calculating the mid
value. So it took 60 years for this code to be correct. But now, next time in the job interview, remember to write it correctly within that 5 minutes.