Parallel Binary Search

Asked 7/12, 2011 at 23:1 Answered 30/7 at 22:52

I'm just starting to learn about parallel programming, and I'm looking at binary search.

This can't really be optimized by throwing more processors at it right? I know it's supposedly dividing and conquering, but you're really "decreasing and conquering" (from Wikipedia).

Or could you possibly parallelize the comparisons? (if X is less than array[mid], search from low to mid - 1; else if X is greater than array[mid] search from mid + 1 to high, else return mid, the index of X)

Or how about you give half of the array to one processor to do binary search on, and the other half to another? Wouldn't that be wasteful though? Because it's decreasing and conquering rather than simply dividing and conquering? Thoughts?

Howler answered 7/12, 2011 at 23:1 Comment(0)

You can easily use parallelism.

For k is less than n processors, split the array into n/k groups and assign a processor to each group.

Run binary search on that group.

Now the time is log(n/k).

There is also a crew method that is logn/log(k+1).

Ramses answered 1/10, 2012 at 22:38 Comment(5)

Can you link to the crew method? – Acro 5/2, 2014 at 13:51

Please think it through. What would happen on the first iteration on the buckets where the key is NOT present. – Till 13/6, 2014 at 18:21

Here is a good paper with the crew algorithm: cs.umd.edu/~gasarch/TOPICS/ramsey/parasearch.pdf – Ramses 8/10, 2014 at 15:41

If you have a processor for each cell, I think it will take an 0(1) to do a search. – Reasoned 7/1 at 7:4

And how long would it take to set up all those cells? O(n). You could create two subsections, and run a binary search in parallel on each half of your data, but the amount of work created in setting up those separate threads is going to be SIGNIFICANTLY more work than the amount of work you saved. Big O doesn't care, but real life certainly does. – Joeljoela 31/1 at 21:32

I don't have much experience in parallel programming, but I doubt this is a good candidate for parallel processing. Each step of the algorithm depends on performing one comparison, and then proceeding down a set "path" based on this comparison (you either found your value, or now have to keep searching in a set "direction" based on the comparison). Two separate threads performing the same comparison won't get you anywhere any faster, and separate threads will both need to rely on the same comparison to decide what to do next, so they can't really do any useful, divided work on their own.

As far as your idea of splitting the array, I think you are just negating the benefit of binary search in this case. Your value (assuming it's in your array), will either be in the top or the bottom half of your array. The first comparison (at the midpoint) in a binary search is going to tell you which half you should be looking in. If you take that even further, consider breaking an array of N elements into N different binary searches (a naive attempt to parallel-ize). You are now doing N comparisons, when you don't need to. You are losing the power of binary search, in that each comparison will narrow down your search to the appropriate subset.

Hope that helps. Comments welcome.

Vassalize answered 7/12, 2011 at 23:30 Comment(0)

I would think it certainly qualifies for parallelisation. At least, across two threads. Have one thread do a depth-first search, and the other do a breadth-first search. The winner is the algorithm that performs the fastest, which may be different from data-set to data-set.

Compaction answered 11/1, 2013 at 11:33 Comment(3)

In case any reader thinks I'm being flippant, I'm actually not. For time-critical sections where an algorithm might not itself parallelise, using multiple algorithms with different performance characteristics may be a valid optimisation to keep the worst-case big-Oh in line. – Compaction 16/1, 2013 at 14:25

How can you apply DFS or BFS to binary search? – Dough 20/6, 2020 at 16:35

@Dough I guess you would structure the array as binary search tree – Frisco 31/8, 2023 at 11:27

Parallel implementation can speed up a binary search, but the improvement is not particularly significant. Worst case, the time required for a binary search is log_2(n) where n is the number of elements in the list. A simple parallel implementation breaks the master list into k sub-lists to be bin-searched by parallel threads. The resulting worst-case time for the binary search is log_2(n/k) realizing a theoretical decrease in the search time.

Example: A list of 1024 entries takes as many as 10 cycles to binary search using a single thread. Using 4 threads, each thread only would only take 8 cycles to complete the search. And using 8 threads, each thread takes 7 cycles. Thus, an 8 threaded parallel binary search could be up to 30% faster than the single threaded model.

However, his speed-up should not be confused with a improvement in efficiency: The 8 threaded model actually executes 8 * 7 = 56 comparisons to complete the search compared to the 10 comparisons executed by the single threaded binary search. It is up to the discretion of the programmer if the marginal gain in speed of a parallel application of binary search is appropriate or advantageous for their application.

Truck answered 11/1, 2013 at 11:2 Comment(2)

I should add that this is an example of 'decrease-and-conquer' as opposed to the generally more desirable 'divide-and-conquer' for a parallel implementation. In this case, each thread performs a smaller problem than the original search and is therefore faster, but the sum of those problems is greater than the original problem. If we could manage a 'divide-and-conquer' implementation here, the sum of the smaller problems would be the same as the original single-threaded implementation, except k times faster (where k is the number of parallel threads). – Truck 11/5, 2013 at 4:34

30% speed up is significant! – Dough 20/6, 2020 at 16:37

I am pretty sure binary search can be speed up by a factor of log (M) where M is the number of processors. log(n/M) = log(n) - log(M) > log(n)/ log(M) for a constant M. I do not have a proof for a tight lower bound, but if M=n, the execution time is O(1), which cannot be any better. An algorithm sketch follows.

Parallel_Binary_Search(sorted_arraylist)

Divide your sorted_arraylist into M chunks of size n/M.
Apply one step of comparison to the middle element of each chunk.
If a comparator signals equality, return the address and terminate.
Otherwise, identify both adjacent chunks where comparators signaled (>) and (<), respectively.
Form a new Chunk starting from the element following the one that signaled (>) and ending at the element preceding the one that signaled (<).
If they are the same element, return fail and terminate.
Otherwise, Parallel_Binary_Search(Chunk)

Mickeymicki answered 3/7, 2013 at 17:0 Comment(0)

Yes, in the classical sense of parallelization (multi-core), binary search and BST are not much better.

There are techniques like having multiple copies of the BST on L1 cache for each processor. Only one processor is active but the gains from having multiple L1 caches can be great (4 cycles for L1 vs 14 cycles for L2).

In real world problems you are often searching multiple keys at the same time.

Now, there is another kind of parallelization that can help: SIMD! Check out "Fast architecture sensitive tree search on modern CPUs and GPUs" by a team from Intel/UCSC/Oracle (SIGMOD 2010). It's very cool. BTW I'm basing my current research project on this very paper.

Till answered 13/6, 2014 at 20:2 Comment(0)

I think using multiple cores and doing n/k search instead of binary search is the way to go in some cases. I have a common problem where I want to find the git commit responsible for a regression, in a large repo with tons of commits. To determine the commit where a change happened that affects me, I need to check out a git repo at a commit, compile a large codebase, and then run a simulation using that code, which might take 30 minutes, say.

Here regular binary search is really slow compared to doing, say, 3-way or even 10-way search. I can spin up multiple virtual machines to do the checkout/simulation process, instead of waiting for a single one to run at a time. Here we're trading off compute cost for time.

Suppletion answered 30/7 at 22:52 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags