Which is more efficient, Sorting and then Binary Search over a Collection or Linear Search in java
Asked Answered
M

4

9

Suppose I am having a Collection of object:

List<String> myList = populateMyArrayList();
//Here I am having an ArrayList with 1000 elements

Which is the better approach:

1 : Mergesort then Binary Search

Collections.sort(myList);
int keyIndex = Collections.binarySearch(myList, key);

2 : Sequential Search

for(String s : myList){
   if(s.equals(key)){
      return s;
   }
}

Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.

EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.

EDIT2: I could have gone for a HashSet, but I am actually having a List<CustomObject> and I can search the List multiple times based on different attributes of CustomObject. So I can't have a overridden equals method in my CustomObject

Mozza answered 26/5, 2014 at 11:2 Comment(7)
The first method will run in O(nlogn) time (due to sorting) while O(n) for a linear search.Supersede
Linear search of course. O(n) vs. O(nlogn + logn) = O(nlogn)Deprecate
Don't forget myList.contains(key).Hibernate
I feel like this question would be more suited for the Programmers SEFonz
How many times are you going to search in the list? How many times are you going to insert items? Unless you have a clear answer to these questions, any answer we give you is pretty much meaningless.Gunzburg
Also maybe you should you consider another data structure such as an HashSet but without further informations these are just suppositions.Supersede
@Gunzburg Edited the question, I will be searching the list couple of times, and no new elements will be added.Mozza
U
16

It depends.

  • If you are searching for only one string the linear search is better because it is in O(n)
  • If you are searching for multiple strings first sorting and then binary searching maybe better. it will be O(logn + n*logn) which is O(n*logn). So if you are checking for ca. n strings, this one is better.
  • If you only want to know if your Collection contains an element (ignoring order), you should consider using HashSet which has O(1).
  • If you need order and a fast contains method, use LinkedHashSet

P.S. premature optimization is the root of all evil.

Unbecoming answered 26/5, 2014 at 11:11 Comment(6)
How is that premature?Firefly
He didn't check if a naive solution suffices his needs.Unbecoming
It is worth mentioning that "big O" notation expresses the behavior of the algorithm as n varies, but it does not imply that one algorithm is "faster" than other for given n value. A log(n) algorithm could be slower than a O(n^2) algorithm for a given n(if n is big enough, in the end the algorithm with the better function will win, but such n value may be large enough to be meaningless). Not that this comments belongs only to your answer, but I had to post it somewhere.Gunzburg
@Gunzburg Big O assumes that n tends to infinity, so it does not say anything about small n.Unbecoming
That is my point... For a limited value (1000 in this case), determining which algorithm is "faster" is not so easy.Gunzburg
@Gunzburg I did not say anything about speed. The question was 'what is more efficient' and just by looking at the formulas it is linear search in case of one search string. Furthermore i tried to suggest that the OP should first try if any approach suffices ("Premature optimization...").Unbecoming
A
4

If you do the search just one time:

  • Complexity for sort + binary search will be O(n * log n).
  • Complexity for linear search is O(n).

If you do the search for more than one time, let's say k times:

  • Complexity for sort + binary search will be O((n + k) * log n).
  • Complexity for linear search will be O(k * n).

So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.

Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1) for an element search.

Antimony answered 26/5, 2014 at 11:15 Comment(5)
Complexity for sort + binary search will be O(nLogn + log n) not O(n * log n)Jeri
@Jeri In complexity theory O(n * log n + log n) is considered equal to O(n * log n), since only the term with the highest rank is considered important. Please check the other answers for this question also, and reconsider your negative vote.Antimony
Complexity for sort + binary search in case of k runs will be O(k * log n) only if k >> n and it definitely remains O(n * log n) if k = 1. The proper complexity I believe is O( (k + n)*log n) or at least a bit of explanation is required.Swagsman
@bosonix - My bad. I thought you had entered the complexity of mergesort as "n". Please be more clear the next time around.Jeri
@AndreiBozantan why is it O(n * log n) ? is the quicksort (assuming that's the most common sorting used in languages) complexity negligeable to the binarysearch complexity ?Tusker
R
0

If you only search the list once (or seldom) linear search is cheaper. If you search the list more often the cost of sorting is repaid. Sorting costs O(n log n) average time complexity und searching then O(log n). If you search for almost "every element" this also costs O(n) average time complexity and you are "even" with sorting.

Riesling answered 26/5, 2014 at 11:11 Comment(0)
C
0

A binary seach is O(log(m)) and is faster than a linear search of O(n). However one must first sort the data: O(n log(n)), which takes longer.

So if the data is filled once, and then often sougth for, take a sorting and binary search. Even better: take a Set. And a HashSet would be even nicer.

Chenay answered 26/5, 2014 at 11:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.