Efficient priority list

Asked 14/7, 2010 at 11:47 Answered 21/10, 2021 at 21:16

I am looking for an efficient data structure to represent a priority list. Specifically I need to assign a priority to a set of items and return only the top scoring items. I have looked into priority queues which operate on heaps, but they don't seem to really suit my needs. They will reorganize the heap structure as soon as I will poll the top rating item from the queue.

The simplest solution would of course be a linked list, which in the worst case would take quite long for the insertion operation.

Does anyone have a better solution?

Anaclinal answered 14/7, 2010 at 11:47 Comment(5)

How many items? Are they being persisted anywhere, if so how? – Cherriecherrita 14/7, 2010 at 11:52

Say more about how efficient you want insertion, retrieval (of priority items), and removal to be, relative to each other. – Coterie 14/7, 2010 at 11:54

I Would like to rate the items first and then retrive the first x top scoring items in the right order. So as there are many insertions the insertion should be rather efficient. The retrival could be less efficient. – Anaclinal 14/7, 2010 at 12:17

How does x compare to n? x <= 100? x close to n/2 what? – Diamante 14/7, 2010 at 15:14

Heaps are the standard way of doing this, but you seem to object to the fact that this reorders the heap contents upon removal of the top element. Why is that a problem? What is it you're really trying to do? – Favian 14/7, 2010 at 15:46

Heaps seem very suitable, and it seems like you are going about it wrongly.

Say you wanted the top x elements (how does this x compare to n, btw?)

What you are doing is putting all into a max-heap and getting the top x.

I suggest instead, you use a min-heap of exactly x elements.

First x elements you insert into heap.

Next incoming element, you compare against the min which can be done very quickly (O(1) time) in the heap. If smaller, you just ignore the incoming element.

If incoming element is larger than min, then you increase the min to the incoming element and sift it down in the heap. This should be logx time at worst.

Once done (in nlogx time), you can retrieve the elements from the heap in sorted order in O(xlogx) time.

Depending on how your data is (and how small x is), using this min-heap solution can be really fast.

If you really really want the inserts to be super-fast and don't care much about the retrieval then you can also do the following.

Insert the elements into a vector (array with amortized O(1) insert time) in the order they come.

The use the Selection algorithm to find the xth largest element (in O(n) time, but the constants might be big). Say that number is S.

Now walk the array comparing each element with S and select the ones as large as S.

If x is reasonably sized and comparable to n (like n/2 or something) this might work out fine, but if x is small compared to n, I would suggest go with the min-heap.

Diamante answered 14/7, 2010 at 15:14 Comment(1)

I didn't think of it this way. This looks very promising. – Anaclinal 14/7, 2010 at 18:46

If you need only the k top items and you never need to look a the others, you can use a simple linked list or array storing only the current top k items, plus a number (the worst score of the elements in the list).

In the Add() operation you simply compare the item with the worst value in the list and, if better, you swap the current worst with the added item. This takes O(k) time in the worst case for insertion because you need to find the element that has currently the worst score. The the average case, however, is O(1), since, as you add better elements to the list, the probability of having to do a swap tends to 0 (that is, you're not actually adding any items).

So if you generate elements at random, your performance is likely to be very good. Even if you generate ordered items (worst case), it might be fast enough for your value of k.

Mobcap answered 14/7, 2010 at 12:25 Comment(1)

Instead of a list, if you use min-heap (see my answer), the worst case time is O(logK). The rest is similar. In fact using min-heaps like is quite a standard method for this problem! (When x is small compared to n). – Diamante 14/7, 2010 at 17:18

Hmm. Skip lists? They should have O(log n) insertion (as heap-based queue) but getting top element should be O(1) [including removing it]. They could be even implemented using lock-free algorithm.

Mahdi answered 14/7, 2010 at 11:51 Comment(2)

Heaps are better than skip lists if you use them correctly. Use a min-heap of x elements, when you need the top x. You don't have to construct a tree/heap of all the n. Just x. – Diamante 14/7, 2010 at 17:19

Sorry - my fault I misread the text (I understood he wants fast poll even at cost of slow add). – Mahdi 14/7, 2010 at 21:11

The JDK has a built-in pqueue class (java.util.PriorityQueue) which is based on a heap algorithm.

Sorry, I only just saw the bit about heaps not fitting your needs. Can you explain why? You can write a custom comparator (or make your items comparable) and the PriorityQueue will order your items appropriately.

Minima answered 14/7, 2010 at 11:54 Comment(4)

As far as I understand him he finds getNext in O(log n) not acceptable. – Mahdi 14/7, 2010 at 11:57

The problem seems to be that ladi wants to be able to get the x first items without removing any of them. That's not typically an operation supported by priority lists. – Mesomorphic 14/7, 2010 at 11:59

I would like rate some items and only get the top n scoring items. So I was wandering if there are any datastructures that only hold the top scoring items but offer a list interface. That means I could go through the list of the top scoring items sequentially. I could of course use a priority queue based on a heap algorithm which has O(log n) insertion and O(n) retrival, get the top scoring elements and add them to a list. I was just curious if there exists something better than that. – Anaclinal 14/7, 2010 at 12:11

@ladi: Not sure what you mean by O(n) retrieval -- extracting the top item from a heap is O(log n). Only if you need to find a specific (non-minimal) element is retrieval O(n). If all you can do is compare 2 items and determine which is bigger, then nothing is asymptotically faster than a heap for the problem you're looking at. – Pagepageant 14/7, 2010 at 12:43

A balanced tree would always guarantee a logarithmic worst case. Although linear time is usually regarded as feasible, there is still a tremendous difference between logarithmic and linear:

for a billion elements, the difference is between 1 billion operations and a few dozens. If each operation takes 1 millisecond, that means going from 11 days to less than a second.

Every node has at most two children.
The heap tree is complete and left-adjusted. Complete means that if the heap has height H, every leaf node is either at level H or H-1. All the levels are left-adjusted, which means that no right sub-tree has a height greater than its left sibling. So, if a leaf is at the same height as an internal node, the leaf can’t be on the left of that node.
Every node holds the highest priority in the subtree rooted at that node.

Binary search trees are the most common kind of trees, but we can use d'ary trees. we can use any value greater than 2, and use the same array representation for the heap.

But the improvement we get with trees comes with a price. First, as with any data structure that uses pointers (lists, graphs, trees, and so on) we have a memory overhead in comparison to arrays. While with the latter we just need to reserve space for the data (plus maybe, depending on the implementation details, some constant space for pointers and the node structure itself), every tree node requires extra space for the pointers to its children and possibly to its parent.

Reference

Indult answered 21/10, 2021 at 21:16 Comment(0)

Recommended topics

Hot tags