Which is more efficient, Sorting and then Binary Search over a Collection or Linear Search in java

Asked 26/5, 2014 at 11:2 Answered 26/5, 2014 at 11:15

Solved java collections binary-search mergesort linear-search

Suppose I am having a Collection of object:

List<String> myList = populateMyArrayList();
//Here I am having an ArrayList with 1000 elements

Which is the better approach:

1 : Mergesort then Binary Search

Collections.sort(myList);
int keyIndex = Collections.binarySearch(myList, key);

2 : Sequential Search

for(String s : myList){
   if(s.equals(key)){
      return s;
   }
}

Should there be a difference in searching approach based on the size of the collection to be searched? If YES then how to decide.

EDIT1: Suppose I have to search the list a couple of times, and no new elements will be added in the list.

EDIT2: I could have gone for a HashSet, but I am actually having a List<CustomObject> and I can search the List multiple times based on different attributes of CustomObject. So I can't have a overridden equals method in my CustomObject

Mozza answered 26/5, 2014 at 11:2 Comment(7)

The first method will run in O(nlogn) time (due to sorting) while O(n) for a linear search. – Supersede 26/5, 2014 at 11:3

Linear search of course. O(n) vs. O(nlogn + logn) = O(nlogn) – Deprecate 26/5, 2014 at 11:3

Don't forget myList.contains(key). – Hibernate 26/5, 2014 at 11:4

I feel like this question would be more suited for the Programmers SE – Fonz 26/5, 2014 at 11:9

How many times are you going to search in the list? How many times are you going to insert items? Unless you have a clear answer to these questions, any answer we give you is pretty much meaningless. – Gunzburg 26/5, 2014 at 11:9

Also maybe you should you consider another data structure such as an HashSet but without further informations these are just suppositions. – Supersede 26/5, 2014 at 11:12

@Gunzburg Edited the question, I will be searching the list couple of times, and no new elements will be added. – Mozza 26/5, 2014 at 11:14

It depends.

If you are searching for only one string the linear search is better because it is in O(n)
If you are searching for multiple strings first sorting and then binary searching maybe better. it will be O(logn + n*logn) which is O(n*logn). So if you are checking for ca. n strings, this one is better.
If you only want to know if your Collection contains an element (ignoring order), you should consider using HashSet which has O(1).
If you need order and a fast contains method, use LinkedHashSet

P.S. premature optimization is the root of all evil.

Unbecoming answered 26/5, 2014 at 11:11 Comment(6)

How is that premature? – Firefly 26/5, 2014 at 11:12

He didn't check if a naive solution suffices his needs. – Unbecoming 26/5, 2014 at 11:13

It is worth mentioning that "big O" notation expresses the behavior of the algorithm as n varies, but it does not imply that one algorithm is "faster" than other for given n value. A log(n) algorithm could be slower than a O(n^2) algorithm for a given n(if n is big enough, in the end the algorithm with the better function will win, but such n value may be large enough to be meaningless). Not that this comments belongs only to your answer, but I had to post it somewhere. – Gunzburg 26/5, 2014 at 11:15

@Gunzburg Big O assumes that n tends to infinity, so it does not say anything about small n. – Unbecoming 26/5, 2014 at 11:17

That is my point... For a limited value (1000 in this case), determining which algorithm is "faster" is not so easy. – Gunzburg 26/5, 2014 at 11:20

@Gunzburg I did not say anything about speed. The question was 'what is more efficient' and just by looking at the formulas it is linear search in case of one search string. Furthermore i tried to suggest that the OP should first try if any approach suffices ("Premature optimization..."). – Unbecoming 26/5, 2014 at 11:24

If you do the search just one time:

Complexity for sort + binary search will be O(n * log n).
Complexity for linear search is O(n).

If you do the search for more than one time, let's say k times:

Complexity for sort + binary search will be O((n + k) * log n).
Complexity for linear search will be O(k * n).

So if you do the search just one time you should go with linear search. If you do the search for more than one time, most probably you should sort first.

Also, maybe in this case you can consider using a hash table, which has amortized complexity of O(1) for an element search.

Antimony answered 26/5, 2014 at 11:15 Comment(5)

Complexity for sort + binary search will be O(nLogn + log n) not O(n * log n) – Jeri 26/5, 2014 at 11:18

@Jeri In complexity theory O(n * log n + log n) is considered equal to O(n * log n), since only the term with the highest rank is considered important. Please check the other answers for this question also, and reconsider your negative vote. – Antimony 26/5, 2014 at 11:23

Complexity for sort + binary search in case of k runs will be O(k * log n) only if k >> n and it definitely remains O(n * log n) if k = 1. The proper complexity I believe is O( (k + n)*log n) or at least a bit of explanation is required. – Swagsman 26/5, 2014 at 11:33

@bosonix - My bad. I thought you had entered the complexity of mergesort as "n". Please be more clear the next time around. – Jeri 26/5, 2014 at 11:42

@AndreiBozantan why is it O(n * log n) ? is the quicksort (assuming that's the most common sorting used in languages) complexity negligeable to the binarysearch complexity ? – Tusker 1/10, 2020 at 20:34

If you only search the list once (or seldom) linear search is cheaper. If you search the list more often the cost of sorting is repaid. Sorting costs O(n log n) average time complexity und searching then O(log n). If you search for almost "every element" this also costs O(n) average time complexity and you are "even" with sorting.

Riesling answered 26/5, 2014 at 11:11 Comment(0)

A binary seach is O(log(m)) and is faster than a linear search of O(n). However one must first sort the data: O(n log(n)), which takes longer.

So if the data is filled once, and then often sougth for, take a sorting and binary search. Even better: take a Set. And a HashSet would be even nicer.

Chenay answered 26/5, 2014 at 11:12 Comment(0)

Recommended topics

Hot tags