Binary Search using start < end vs. using start <= end

D

2

9

In binary search, we usually have low and high variables and typically there is a while loop that tests if low <= high, as shown in this code (from Wikipedia):

int SortedArray[max] = {....}

int BinarySearch (int key)
{
    int start = 0;
    int end = max - 1;
    int mid;
    while (start <= end)
    {
        mid = (start + end) / 2;
        if (key == a[mid])
            return mid;
        else if (key < a[mid])
            end = mid - 1;
        else start = mid + 1;
    }
    return -1;
}

When learning binary search, I was always taught the start <= end approach, but when seeing other implementations, I've seen a lot of people do while(start < end).

Is there an advantage to one versus the other? In my own native implementations, I do the <= approach but when I switch it out for <, the search fails.

Is there a rule of thumb for using one versus the other?

Dene answered 28/5, 2017 at 19:51 Comment(3)

can we see your implementation and/or the example? – Esse 28/5, 2017 at 20:2

Post your code and we'll tell you why it fails. In a good implementation it's, at most, a tiny implementation detail and will make no practical difference. – Vazquez 28/5, 2017 at 20:3

I think this question is A) very clear, we don't need any examples to understand it, B) very useful because the Binary Search while(start < end) vs. while(start <= end) loop is very confusing – Chindwin 7/11, 2019 at 9:0

R

13

even if your question is probably not super clear, I could infer you are talking about this kind of implementation of the binary search (here in C, from Wikipedia):

int SortedArray[max] = {....}

int BinarySearch (int key)
{
    int start = 0;
    int end = max - 1;
    int mid;
    while (start <= end)
    {
        mid = (start + end) / 2;
        if (key == a[mid])
            return mid;
        else if (key < a[mid])
            end = mid - 1;
        else start = mid + 1;
    }
    return -1;
}

If you replace start <= end by start < end, there will be cases where your algorithm will not give the good answer.

Let's think about two cases.

1 - You would like to search 1 in the list [1]. In that case, start = 0, end = 0 and the algorithm would return -1 if you change the loop condition.

2 - You would like to search 2 in the list [1, 2]. In that case, start = 0, end = 1. The algorithm will set mid = (0+1)/2=0 in C. Thus arr[mid] < key. It will make start = 1, end = 1. Again, if you stop the loop here, the algorithm will return -1 instead of 1.

And there are probably many other examples.

Have a nice day

Rotorua answered 28/5, 2017 at 20:8 Comment(3)

Thanks, this is all I needed! – Dene 28/5, 2017 at 20:11

Super, but, could you edit the question in order to make it clearer. You can use the code I posted (by referring wikipedia). I'll remove mine and It will help the next people to have a quicker understanding of the question ;) – Rotorua 28/5, 2017 at 20:12

This is misleading / incorrect. If you change only start <= end to start < end, then yes, the algorithm will fail, but if you make appropriate code changes (see my answer) it will work just fine. – Terrence 28/5, 2017 at 23:10

T

16

For low <= high, high is considered inclusive (high is part of the range we consider).

For low < high, high is considered exclusive (high is not part of the range we consider).

Both can be correct, but there will be minor differences in the rest of the code, specifically how high is initialised (high = length-1 versus high = length) and how it's updated (high = mid-1 versus high = mid).

Which one is better?

The main difference is that mid = (low + high) / 2 will be slightly different for each case.

More specifically, high will be 1 bigger in the exclusive case, thus, when high-low is even in the inclusive case, mid will stay the same, but when high-low is odd in the inclusive case, mid will be 1 element bigger in the exclusive case (this is because of rounding).

Let's consider an example:

length = 6
low = 0
highInclusive = 5, highExclusive = 6
midInclusive = 5/2 = 2, midExclusive = 6/2 = 3

As you can see, when there is no single middle element, one will pick the element to the left and the other will pick the element to the right.

While this will sometimes make the one faster and sometimes make the other faster, the average running time will be pretty much identical.

From a readability perspective, it might be slightly better (in my opinion) to use the exclusive one in languages with 0-based arrays and either one in languages with 1-based arrays, in order to minimise the number of -1's in the code. An argument could also be made to just stick to a single version in all languages, as to not require that people understand both versions or get confused between the two.

Terrence answered 28/5, 2017 at 23:7 Comment(2)

Thank you. In some implementations, I saw they have while (start + 1 < end), not sure why they added +1 to start please? – Dreary 5/1, 2022 at 5:10

@Dreary I wouldn't be able to answer that off the top of my head without code, but if start + 1 < end implementations work, then something similar to what I said in the answer presumably applies: there would be other minor differences to make it work and there may be minor differences in terms of readability and/or performance. Although if it's start + 1 <= end instead, that would be equivalent to start < end (the difference between those is mostly just personal preference - I'd usually prefer the latter, although a +1 can sometimes provide some insight into how exactly the code works). – Terrence 5/1, 2022 at 9:46

R

13