Extracting a given number of the highest values in a List
Asked Answered
C

10

5

I'm seeking to display a fixed number of items on a web page according to their respective weight (represented by an Integer). The List where these items are found can be of virtually any size.

The first solution that comes to mind is to do a Collections.sort() and to get the items one by one by going through the List. Is there a more elegant solution though that could be used to prepare, say, the top eight items?

Coleville answered 12/4, 2010 at 20:31 Comment(2)
Are you stuck with using a list? If not, there are better data structures, such as priority queues, maps and sets.Corydon
Not really but Lists are more practical when returning results using Hibernate.Coleville
S
6

Just go for Collections.sort(..). It is efficient enough.

This algorithm offers guaranteed n log(n) performance.

You can try to implement something more efficient for your concrete case if you know some distinctive properties of your list, but that would not be justified. Furthermore, if your list comes from a database, for example, you can LIMIT it & order it there instead of in code.

Sustainer answered 12/4, 2010 at 20:34 Comment(3)
Noted. Using LIMIT is a good idea. There's nothing special about the List other than that it needs to be sorted according to one criteria at a time (popularity and date).Coleville
+1 for the database idea - thinking outside the parameters of the given problem. Reminds me of the post office solution in Programming Pearls (or maybe More Programming Pearls).Privity
O(n log(k)) for a heap can be much better than O(n log(n)) for a general sort.Hesperian
P
5

Your options:

  1. Do a linear search, maintaining the top N weights found along the way. This should be quicker than sorting a lengthly list if, for some reason, you can't reuse the sorting results between displaying the page (e.g. the list is changing quickly).

    UPDATE: I stand corrected on the linear search necessarily being better than sorting. See Wikipedia article "Selection_algorithm - Selecting k smallest or largest elements" for better selection algorithms.

  2. Manually maintain a List (the original one or a parallel one) sorted in weight order. You can use methods like Collections.binarySearch() to determine where to insert each new item.

  3. Maintain a List (the original one or a parallel one) sorted in weight order by calling Collections.sort() after each modification, batch modifications, or just before display (possibly maintaining a modification flag to avoid sorting an already sorted list).

  4. Use a data structure that maintains sorted weight-order for you: priority queue, tree set, etc. You could also create your own data structure.

  5. Manually maintain a second (possibly weight-ordered) data structure of the top N items. This data structure is updated anytime the original data structure is modified. You could create your own data structure to wrap the original list and this "top N cache" together.

Privity answered 12/4, 2010 at 21:30 Comment(2)
Thanks Bert F. You get an up vote for taking the trouble to write an exhaustive answer.Coleville
Thanks. Just doing what Joel says: "Want to know an easy way to earn reputation? Find a question somewhere with several good, but incomplete, answers. Steal all the answers and write one long, complete, detailed answer which is better than the incomplete ones. Sit back and earn points while people vote up your comprehensive answer." [joelonsoftware.com/items/2008/09/15.html]Privity
I
3

You could use a max-heap.

If your data originates from a database, put an index on that column and use ORDER BY and TOP or LIMIT to fetch only the records you need to display.

Inexperience answered 12/4, 2010 at 20:34 Comment(1)
Java's PriorityQueue uses a max-heap as the implementation.Triboluminescence
E
3

Or a priority queue.

Expressly answered 12/4, 2010 at 20:35 Comment(0)
D
3

using dollar:

List<Integer> topTen = $(list).sort().slice(10).toList();

without using dollar you should sort() it using Collections.sort(), then get the first n items using list.sublist(0, n).

Damselfly answered 12/4, 2010 at 20:42 Comment(1)
I like the slice() method. Interesting to see a jQuery equivalent for Java.Coleville
G
2

Since you say the list of items from which to extract these top N may be of any size, and so may be large I assume, I'd augment the simple sort() answers above (which are entirely appropriate for reasonably-sized input) by suggesting most of the work here is finding the top N -- then sorting those N is trivial. That is:

Queue<Integer> topN = new PriorityQueue<Integer>(n);
for (Integer item : input) {
  if (topN.size() < n) {
    topN.add(item);        
  } else if (item > topN.peek()) {
    topN.add(item);          
    topN.poll();
  }
}

List<Integer> result = new ArrayList<Integer>(n);
result.addAll(topN);
Collections.sort(result, Collections.reverseOrder());

The heap here (a min-heap) is at least bounded in size. There's no real need to make a heap out of all your items.

Guiana answered 12/4, 2010 at 21:25 Comment(0)
O
1

No, not really. At least not using Java's built-in methods.

There are clever ways to get the highest (or lowest) N number of items from a list quicker than an O(n*log(n)) operation, but that will require you to code this solution by hand. If the number of items stays relatively low (not more than a couple of hundred), sorting it using Collections.sort() and then grabbing the top N numbers is the way to go IMO.

Overlook answered 12/4, 2010 at 20:36 Comment(0)
S
1

Depends on how many. Lets define n as the total number of keys, and m as the number you wish to display.
Sorting the entire thing: O(nlogn)
Scanning the array each time for the next highest number: O(n*m)
So the question is - What's the relation between n to m?
If m < log n, scanning will be more efficient.
Otherwise, m >= log n, which means sorting will be better. (Since for the edge case of m = log n it doesn't actually matter, but sorting will also give you the benefit of, well, sorting the array, which is always nice.

Sg answered 12/4, 2010 at 22:7 Comment(0)
L
0

If the size of the list is N, and the number of items to be retrieved is K, you need to call Heapify on the list, which converts the list (which has to be indexable, e.g. an array) into a priority queue. (See heapify function in http://en.wikipedia.org/wiki/Heapsort)

Retrieving an item on the top of the heap (the max item) takes O (lg N) time. So your overall time would be:

O(N + k lg N)

which is better than O (N lg N) assuming k is much smaller than N.

Layout answered 13/4, 2010 at 0:24 Comment(0)
S
0

If keeping a sorted array or using a different data structure is not an option, you could try something like the following. The O time is similar to sorting the large array but in practice this should be more efficient.

small_array = big_array.slice( number_of_items_to_find );
small_array.sort();
least_found_value = small_array.get(0).value;

for ( item in big_array ) {  // needs to skip first few items
  if ( item.value > least_found_value ) {
    small_array.remove(0);
    small_array.insert_sorted(item);
    least_found_value = small_array.get(0).value;
  }
}

small_array could be an Object[] and the inner loop could be done with swapping instead of actually removing and inserting into an array.

Sphincter answered 13/4, 2010 at 1:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.