This question is about the efficiency of a linear search vs. the efficiency of a binary search for a pre-sorted array in contiguous storage...
I have an application written in fortran (77!). One frequent operation for my part of the code is to find the index in an array such that gx(i) <= xin < gx(i+1)
. I've currently implemented this as a binary search
-- sorry for the statement labels and goto
-- I've commented what the equivalent statments would be using fortran 90...
i=1
ih=nx/2
201 continue !do while (.true.)
if((xin.le.gx(i)).and.(xin.gt.gx(i+1)))then !found what we want
ilow=i+1; ihigh=i
s1=(gx(ihigh)-xin)/(gx(ihigh)-gx(ilow))
s2=1.0-s1
return
endif
if(i.ge.ih)then
goto 202 !exit
endif
if(xin.le.(gx(ih))then !xin is in second half of array
i=ih
ih=nx-(nx-ih)/2
else !xin is in first half of array
i=i+1
ih=i+(ih-i)/2
endif
goto 201 !enddo
However, today, I was reading on Wikipedia about binary search and I came across this:
Binary search can interact poorly with the memory hierarchy
(i.e. caching), because of its random-access nature. For
in-memory searching, if the span to be searched is small, a
linear search may have superior performance simply because
it exhibits better locality of reference.
I don't completely understand this statement -- my impression was that cache fetches were gathered in large(ish) chunks at a time, so if we start at the beginning of the array, I thought that most of the array would be in cache already (at least as much as it would be for a linear search), so I didn't think that would matter.
So my question is, is there any way to tell which algorithm will perform better (linear or binary search?) Is there an array size boundary? I'm currently using arrays of size around 100 elements...