What is the difference between [start/2 + mid/2] and [(start + mid)/2] in binary search?

H

4

7

In the binary search algorithm, we set the mid as:

mid = (start + end)/2 which is same as

mid = start/2 + end/2 and also equal to

mid = start + (end - start)/2

but all of the three give different results being the same arithmetic expression. How do these vary during the computation process?

This was the code to find the last occurrence of an element in a vector array using binary search:

int lastOccurrence(vector<int> arr, int size, int key){
    int start = 0, end = size - 1;
    int last = -1;
    // int mid = start/2 + end/2;
    int mid;
    while(start <= end){
        // mid = start + (end - start)/2;
        mid = (start + end)/2;
        if(key == arr[mid]){
            last = mid;
            start = mid + 1;
        }
        else if(key > arr[mid]){
            start = mid + 1;
        }
        else{
            end = mid - 1;
        }
        cout << "start: " << start << "\tend: " << end << "\tmid: " << mid << endl;
    }
    return last;
}

The values being passed to the function are:

int main(){
    vector<int> arr = {1,2,3,4,4,4,4,5,6,7,11};
    int size = arr.size();
    int key = 4;
    cout << "First occurrence of " << key << " is at index " << firstOccurrence(arr, size, key) << endl;

    cout << "Last occurrence of " << key << " is at index " << lastOccurrence(arr, size, key) << endl;

    return 0;
}

If the mid element equals the required "key" element then the mid index is stored in a variable and start is updated to mid + 1 so that it can search the right part of the array for any other occurrence of the "key". If the "key" is found to be less than the mid element it implies that the element is not present beyond the mid element and the end is updated to mid - 1 to search in the left part of the array, and similarly to search the right part if "key" is found to be greater than the mid element.

It gave different results when mid = start/2 + end/2 was used and mid = (start + end)/2 was used. How is this affected during the computation process?

Hebe answered 10/7, 2023 at 11:25 Comment(6)

What is the input to the function? Have you tried to step through the code in a debugger while monitoring variables and their values, to see what really happens? And you do remember that an absolute requirement for binary search to work is that the data you search through is ordered? – Nailbrush 10/7, 2023 at 11:30

Not going to re-calculate each of these, but it boils down to your values being integers. Integer division will cut off any decimal places. – Commutate 10/7, 2023 at 11:31

(1+3)/2 is not the same as 1/2+3/2. – Tapdance 10/7, 2023 at 11:31

mid = start + (end - start)/2 is the same as 2*mid = 2*start + (end - start) is the same as 2*mid = 2*start + end - start is the same as 2*mid = start + end is the same as mid = (start + end)/2. Please show your values. (this all assuming start+end doesn't overflow) – Contretemps 10/7, 2023 at 11:32

they are not mathematically equivalent – Hinshaw 10/7, 2023 at 11:36

Each operation you describe involves operands of type int, so the result is an int. So start/2 rounds towards zero (e.g. 3/2 gives 1, not 1.5), as does end/2. As an example, 1/2 + 3/2 computes 1/2 and 3/2 separately, producing 0 and 1 respectively, and their sum is 1. Whereas (1+3)/2 computes 1+3 first, giving 4, and divides that by 2, producing 2. – Astri 10/7, 2023 at 11:39

S

4

You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start and stop you get different results.

Suppose they are

 start = M*2 + a;
 end = N*2 + b;

Where M and N are integers and a and b are either 1 or 0, then you get

mid_0 = (start + end)/2 = M+N + (a+b) / 2
mid_1 = start/2 + end/2 = M+N
mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2

Only the second expression does not depend on whether start or end are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2 yields the same result as (a+b)/2. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end overflows then (start/2) + (end/2) does not.

Spoon answered 10/7, 2023 at 11:39 Comment(3)

It means that while doing it on paper, all of the three expressions turn out to be the same expression as start/2 + end/2 but affect the solution for intergers and float during programming. – Hebe 10/7, 2023 at 12:23

@Hebe no, also on paper 5 / 2 + 5/2 is 2 + 2 = 4 is not the same as (5+5)/10. Its integer arithmetics. You can do it on paper. Its just not the "normal" arithmetics you are used to (where (a+b)/2 == a/2 + b/2) – Spoon 10/7, 2023 at 12:28

It is not mathematics with real numbers or even natural numbers x ∉ ℝ ∧ x ∉ ℕ, replace x with any of your variables start, end, mid. So the normal maths does not work. Your x are int, so you are dealing with bits and bytes and unsigned bit, register overflow, carry bit, when really dealing with integer mathematics at the core. – Luxuriate 16/7, 2023 at 15:0

D

6

For starters the function is invalid. When size is equal to 1 then you have due to this statement

int start = 0, end = size - 1;

that end is equal to 0.

In this case the while loop

while(start < end){

will be skipped. And the function will return the value of last equal to -1

int last = -1;

// ...

return last;

though arr[0] can be equal to key.

As for your question then when start and end are both odd values then the value of mid will be one less for the expression

start/2 + end/2

then for other two expressions.

As for this expression (start + end)/2 then it is unsafe due to a possible overflow of the sum start + end.

Pay attention to that in C++20 there is function std::midpoint declared in header <numeric> that can be and should be used instead of manually written expressions.

As for the function as whole then there is already standard algorithm std::upper_bound declared in header <algorithm> that can be adapted for using instead of the function.

Derick answered 10/7, 2023 at 11:57 Comment(0)

S

4

You need to consider that integer arithmetic cuts off any fractional parts, hence depending on the last bit of start and stop you get different results.

Suppose they are

 start = M*2 + a;
 end = N*2 + b;

Where M and N are integers and a and b are either 1 or 0, then you get

mid_0 = (start + end)/2 = M+N + (a+b) / 2
mid_1 = start/2 + end/2 = M+N
mid = start + (end - start)/2 = M*2 + a + (N-M) + (b-a)/2 = M+N + a + (b-a)/2

Only the second expression does not depend on whether start or end are even or odd. I didn't actually bother to work out (via a 2x2 table) whether a + (b-a)/2 yields the same result as (a+b)/2. However, when dealing with integer arithmetic you better do not rely on intuition, because it's too easy to be off by one (or more). Moreover, I did not yet consider integer overflow. When start+end overflows then (start/2) + (end/2) does not.

Spoon answered 10/7, 2023 at 11:39 Comment(3)

It means that while doing it on paper, all of the three expressions turn out to be the same expression as start/2 + end/2 but affect the solution for intergers and float during programming. – Hebe 10/7, 2023 at 12:23

@Hebe no, also on paper 5 / 2 + 5/2 is 2 + 2 = 4 is not the same as (5+5)/10. Its integer arithmetics. You can do it on paper. Its just not the "normal" arithmetics you are used to (where (a+b)/2 == a/2 + b/2) – Spoon 10/7, 2023 at 12:28

It is not mathematics with real numbers or even natural numbers x ∉ ℝ ∧ x ∉ ℕ, replace x with any of your variables start, end, mid. So the normal maths does not work. Your x are int, so you are dealing with bits and bytes and unsigned bit, register overflow, carry bit, when really dealing with integer mathematics at the core. – Luxuriate 16/7, 2023 at 15:0

D

3

All of these are attempts to prevent overflow when calculating the midpoint:

(a + b) / 2

The best way (supposedly) is this:

a + b = (a ^ b) + (a & b) << 1
(a + b) / 2 = (a ^ b) / 2 + (a & b)

The identity comes from Don Knuth's book, The Art of Computer Programming, Vol. 4.

This is because Don's formula is supposedly bulletproof. It should work equally well on negative and positive indices. Note that shifting right (including arithmetically) is not always the same as dividing by 2.

Donau answered 10/7, 2023 at 12:50 Comment(11)

Casting to unsigned int works, too, if the code is using signed integers as array indices: (a + b) / 2 = (int)(((uint)a + (uint)b) >> 2) . – Incontrovertible 27/7, 2023 at 15:6

@Incontrovertible AFAIK, ((uint)a + (uint)b) can overflow and why shift by 2 places? Typo and you didn't understand the point. – Donau 27/7, 2023 at 18:26

The shift was a typo but the formula cannot overflow because positive signed int a + b is at most 2^32 - 2, which is valid if you cast both a and b to unsigned first. Unsigned shift doesn't do sign extension, so ((uint)a + (uint)b) >> 1 is at most 2^31 - 1. This formula was used in C#'s source code multiple times, so there is a chance that it's either faster or at least more readable than the supposedly best way. – Incontrovertible 28/7, 2023 at 7:11

@Incontrovertible not true, there is no sign bit in unsigned integers. – Donau 28/7, 2023 at 7:19

"Unsigned shift doesn't do sign extension (because there is no sign bit in unsigned integers)". Better? It's what I wrote. – Incontrovertible 28/7, 2023 at 7:29

@Incontrovertible you just dont understand. If there was a sign bit and it were clear, then what you wrote would be true, but as it is overflow can happen. – Donau 28/7, 2023 at 7:36

Show me in which case it overflows. source.dot.net/#System.Private.CoreLib/src/libraries/… – Incontrovertible 28/7, 2023 at 7:40

I am completely uninterested in c#. – Donau 28/7, 2023 at 7:42

Then show me in C++ where it overflows. – Incontrovertible 28/7, 2023 at 7:50

(~0u + 1u) >> 1; – Donau 28/7, 2023 at 9:11

~0u is not representable as a positive signed int. No addition of two valid signed array indices can represent a number that overflows as unsigned. If the index is/becomes invalid, the binary search is out of range and has to be stopped anyway. start/mid/end must all satisfy 0 <= n <= size. – Incontrovertible 28/7, 2023 at 9:37

A

2

As mentioned in the previous answer, (start + end)/2 handles residuals better. However, start/2 + end/2 is not susceptible to overflow while (start + end)/2 is.

So if you're handling arrays with potentially >2G elements, it is advised to either make start/end/mid 64b ints or prefer the start/2 + end/2 form.

Alonaalone answered 10/7, 2023 at 11:51 Comment(0)

Recommended topics

Hot tags