Why does my binary search need an extra comparison? log2(N)+1

Asked 13/8, 2015 at 18:48 Answered 18/10, 2024 at 17:23

I want to find the index of the first integer in an array of integers which is <= key. I can do it with binary search in log2(N)+1 compares. Shouldn't it be possible with only log2(N) compares?

// Returns the index of the first integer in keys <= key. size must be a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;

    for (int i = 0; i < log2(size); i++) {
        size /= 2;

        if (keys[size] < key)
            keys += size;
    }

    unsigned i = unsigned(keys - origKeys);
    // Not only is this step messy, because it doesn't fit the loop, but
    // now we have log2(n) + 1 comparisons.
    if (*keys < key)
        i++;

    return i;
}

Dg answered 13/8, 2015 at 18:48 Comment(8)

Does your log2 function return the ceiling or floor of the logarithm? – Twowheeler 13/8, 2015 at 18:50

size is a power of 2 per the comment, so it doesn't matter – Dg 13/8, 2015 at 18:57

For a unsorted array, the worst case complexity is going to be O(N), no matter what. If you don't have any element <= key, you still have to scan the whole array => O(N) time complexity. – Mansfield 13/8, 2015 at 18:59

@TamasIonut I don't think that's true. Look at the code carefully. Assuming log2() does what it claims (computes a logarithm of base-2), the code is O(log(N)). – Tunicle 13/8, 2015 at 19:2

@TamasIonut keys is sorted, otherwise one has no business using binary search – Dg 13/8, 2015 at 19:9

the log2() is being assigned to an int, so the result will be floor() – Barone 13/8, 2015 at 19:9

Uh, if you have a runtime/complexity proportional to log(n)+1, the constant is ignored for asymptotic complexity like O(). – Unpin 13/8, 2015 at 19:14

@EOF I'm aware of that, but I'm not talking big O here. Also see templatetypedef's answer which suggests that log2(N+1) is what's needed. – Dg 13/8, 2015 at 19:21

Let's think about this from an information-theoretic perspective. If you have an array with n elements in it, there are n+1 possible spots where the new element can go: just before any element of the array, or after all the elements. Therefore, your algorithm needs to make enough comparisons to be able to uniquely identify which of the n+1 spots is the right one. If you don't make enough comparisons, the answer you give back won't always be right.

Every comparison you make can, in the best case, eliminate half of the possible positions. Therefore, in the theoretical limit, with k comparisons you can decide which of 2^k positions is right. Since there are n+1 possible positions, you need to make lg (n+1) comparisons in the worst case, rather than lg n. Since your n is a perfect power of two, this means that one extra comparison is required. On the other hand, if you have n be one less than a perfect power of two, then making ceil(lg n) comparisons should suffice.

Edited by Eloff, this code seems to give the right answer with log2(n+1) steps as you predicted:

// Returns the index of the first integer in keys <= key. size must be one less than a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;
    size++;

    while(size > 1) {
        size /= 2;

        if (keys[size-1] < key)
            keys += size;
    }

    return unsigned(keys - origKeys);        
}

Twowheeler answered 13/8, 2015 at 19:3 Comment(5)

That reasoning makes sense, and I can restrict size to be one less than a power of 2. It also suggests that a solution is possible with log2(n+1) comparisons. Adapting the code is proving difficult, but I'm trying. – Dg 13/8, 2015 at 19:20

If they key is guaranteed to be in the array, it would be perfect log2(N). If it can be anywhere, i can see why you need to check n+1 spots. But does this still hold for a "<= key"? This seems to imply you don't need to check "after" the highest element - that "+1" position. – Robustious 13/8, 2015 at 19:27

@Robustious If there are n elements in the array, the first element less than or equal to the key can be one of the n elements, or it could be none of the elements. Equivalently, you can think of the element going just after any of the elements, or coming before all of the elements. That's where the extra +1 term comes in in the log(n+1). – Twowheeler 13/8, 2015 at 19:29

Got it! My reasoning assumed the key is "in" the list but it can be "outside" on both ends. – Robustious 13/8, 2015 at 19:31

@Twowheeler The code above passes my tests, it seems to work. The key was limiting size to a power of 2 minus 1, as you predicted. – Dg 13/8, 2015 at 19:31

Yes, using a binary search and assuming element is in the set, log2(N) compares will suffice.

However, if your element is not in the set, it could take you log2(N) + 1, compares to realize that you have exhausted the set. The final comparison is to an empty set which let's you realize you are done.

Personally, I wondered if the comparison to an empty set counts as a comparison, but the text I'm reading counts it as a step the computer would take increasing the runtime by one step.

Charla answered 18/10, 2024 at 17:23 Comment(0)

Recommended topics

Hot tags