binary search efficiency vs. linear search efficiency in fortran

This question is about the efficiency of a linear search vs. the efficiency of a binary search for a pre-sorted array in contiguous storage...

I have an application written in fortran (77!). One frequent operation for my part of the code is to find the index in an array such that gx(i) <= xin < gx(i+1). I've currently implemented this as a binary search -- sorry for the statement labels and goto -- I've commented what the equivalent statments would be using fortran 90...

        i=1
        ih=nx/2
201     continue  !do while (.true.)
           if((xin.le.gx(i)).and.(xin.gt.gx(i+1)))then  !found what we want
              ilow=i+1; ihigh=i
              s1=(gx(ihigh)-xin)/(gx(ihigh)-gx(ilow))
              s2=1.0-s1
              return
           endif
           if(i.ge.ih)then
              goto 202 !exit
           endif
           if(xin.le.(gx(ih))then !xin is in second half of array
              i=ih
              ih=nx-(nx-ih)/2
           else !xin is in first half of array
              i=i+1
              ih=i+(ih-i)/2
           endif
        goto 201  !enddo

However, today, I was reading on Wikipedia about binary search and I came across this:

Binary search can interact poorly with the memory hierarchy 
(i.e. caching), because of its random-access nature. For 
in-memory searching, if the span to be searched is small, a
linear search may have superior performance simply because 
it exhibits better locality of reference.

I don't completely understand this statement -- my impression was that cache fetches were gathered in large(ish) chunks at a time, so if we start at the beginning of the array, I thought that most of the array would be in cache already (at least as much as it would be for a linear search), so I didn't think that would matter.

So my question is, is there any way to tell which algorithm will perform better (linear or binary search?) Is there an array size boundary? I'm currently using arrays of size around 100 elements...

For small arrays, the problem is not cache. You are right: A small array is likely to be cached quickly.

The problem is that branch prediction is likely to fail for binary search because branches are taken or skipped at random in a data-dependent way. Branch prediction misses stall the CPU pipeline.

This effect can be severe. You can easily search 3 to 8 elements linearly in the same time it takes to do a single binary search branch (and you need to do multiple binary search branches). The exact break even point needs to be measured.

Stalling the CPU pipeline is extremely expensive. A Core i7 can retire up to 4 instructions per clock cycle (12 giga-instructions per second at 3 GHz!). But only, if you are not stalling.

There are branch-free algorithms doing binary search by using conditional-move CPU instructions. These algorithms basically unroll 32 search steps and use a CMOV in each step (32 steps are the theoretical maximum). They are branch-free but not stall free: Each next step depends 100% on the previous one so the CPU cannot charge ahead in the instruction stream. It has to wait all the time. So they don't solve this problem, only improve it slightly.

Recommended topics

Hot tags