Please let me try to explain why the binary search works. Let's say we've the following sequence A=[-5, 10, 14, 33, 42, 42, 42]
and searched_value = 14
we would have:
Iteration 1, lo = 0, hi = 6, mid = 3 A[mid] > 14
Iteration 2, lo = 0, hi = 2, mid = 1 A[mid] < 14
Iteration 3, lo = 2, hi = 2, mid = 2 A[mid] = 14 (found!)
At each iteration of the algorithm we could conclude that lo
and hi
always contains the position of the searched value, and we'll prove it by induction on the number of iterations:
Inductive Hypothesis: if the searched value is present in the sequence it will always be contained between lo
and hi
.
Base Case: At iteration 1 lo = 0
and hi = n-1
contains all elements, in consequence if the searched value is present in the sequence it'll be contained between lo
and hi
, and the invariant is trivially correct.
Inductive Step: At any step if the searched value is contained between lo
and hi
, it'll continue to be contained between lo
and hi
at the next iteration. We've 3 possibilities (and here is the answer to the question):
- If
A[mid] = searched_value
: In this case the algorithm finishes reporting correctly the position of the searched value in the sequence and the invariant is correct because the searched value was between lo
and hi
.
- If
A[mid] < searched_value
: Knowing that it's a sorted sequence, all the elements between A[lo...mid] < searched_value
(including A[mid]
), thus we can assign lo=mid+1
(safely searching only in the upper half) and the invariant will still be correct at the next iteration.
- If
A[mid] > searched_value
: knowing that it's a sorted sequence, all the elements between A[mid...hi] > searched value
(including A[mid]
), thus we can assing hi=mid-1
(safely searching only in the lower half) and the invariant will still be correct at the next iteration.
Given that, at each iteration, the algorithm always searches on smaller segments
of the sequence, the termination condition is guaranteed because either there's only 1 element that is searched_value
or it is different and, at the next iteration, the algorithm is going to report that such element isn't present in the sequence.
In consequence the algorithm is proved correct (and the reason why we use mid+1
and mid-1
).