c++ Why std::multimap is slower than std::priority_queue
Asked Answered
C

4

6

I implemented an algorithm where I make use of an priority queue. I was motivated by this question: Transform a std::multimap into std::priority_queue

I am going to store up to 10 million elements with their specific priority value.

I then want to iterate until the queue is empty. Every time an element is retrieved it is also deleted from the queue.

After this I recalculate the elements pririty value, because of previous iterations it can change.

If the value did increase I am inserting the element againg into the queue. This happens more often dependent on the progress. (at the first 25% it does not happen, in the next 50% it does happen, in the last 25% it will happen multiple times).

After receiving the next element and not reinserting it, I am going to process it. This for I do not need the priority value of this element but the technical ID of this element.

This was the reason I intuitively had chosen a std::multimap to achieve this, using .begin() to get the first element, .insert() to insert it and .erase() to remove it. Also, I did not intuitively choose std::priority_queue directly because of other questions to this topic answering that std::priority_queue most likely is used for only single values and not for mapped values.

After reading the link above I reimplemented it using priority queue analogs to the other question from the link. My runtimes seem to be not that unequal (about an hour on 10 mio elements). Now I am wondering why std::priority_queue is faster at all.

I actually would expect to be the std::multimap faster because of the many reinsertions. Maybe the problem is that there are too many reorganizations of the multimap?

Cabinet answered 23/1, 2017 at 13:38 Comment(2)
I have trouble understanding your algorithm. Priority queue and multimap are quite different semantically. One is sorted associative container which can hold multiple items for the same key. It has to provide a rather fast find via key. The other is essentially an int, T pair container designed to pick the extreme int. It has less constraints. I would expect it to be faster. Is your key in the multimap the priority from the queue like in the linked question? Do I get it right?Transect
yes, in multimap the key is the priority value of the specific element, and an priority value can (and it does) occur multiple times. If a priority value does exist multiple times it doesnt matter which element of the 'group' to pick first.Cabinet
B
7

To summarize: your runtime profile involves both removing and inserting elements from your abstract priority queue, with you trying to use both a std::priority_queue and a std::multimap as the actual implementation.

Both the insertion into a priority queue and into a multimap have roughly equivalent complexity: logarithmic.

However, there's a big difference with removing the next element from a multimap versus a priority queue. With a priority queue this is going to be a constant-complexity operation. The underlying container is a vector, and you're removing the last element from the vector, which is going to be mostly a nothing-burger.

But with a multimap you're removing the element from one of the extreme ends of the multimap.

The typical underlying implementation of a multimap is a balanced red/black tree. Repeated element removals from one of the extreme ends of a multimap has a good chance of skewing the tree, requiring frequent rebalancing of the entire tree. This is going to be an expensive operation.

This is likely to be the reason why you're seeing a noticeable performance difference.

Buyse answered 23/1, 2017 at 13:52 Comment(13)
"you're removing the last element from the vector" - why would you be removing last element from the vector? There is no easy way to do it in the priority_queue. Do you mean top element? It's usually the 1st (0th) in the vector. (I might have misunderstood something)Transect
@Transect the top element is swapped to the end of the container (while maintaining the heap property), then popped off the back. Maintaining the heap property doesn't have constant complexity, though. I know of no heap structure that has constant delete.Fugacious
@user2079303 That is my point. There is no "removing of last element" involved. Both removals have O(log size) complexity. In priority queue, you remove 1st / top element and then put the last one at the top, and push it down. Moreover, the erase for multimap is said to have constant amortized time. Thus, I think this analysis is wrong.Transect
this would mean than after swapping the top element, the swapped element needs to be bubbled to the end of the vector again? so deleting is constant but needs n swapping operations? does not sound that cheap, is it?Cabinet
priority queue stores the elements in the queue sorted, with the highest priority at the end of the vector. Hence, removing the next element in the priority queue involves removing the last element from the underlying vector. priority queue is very smart.Buyse
@Sam Um. No. Why would pop have O (log(size)) complexity then? Look it even says that pop: Effectively calls std::pop_heap(c.begin(), c.end(), comp); c.pop_back();, and pop_heap: "This has the effect of removing the first (largest) element from the heap defined by the range [first, last)." It's the 1st element not the last. It also doesn't have the priorities sorted. Only the top element is guaranteed to be 1st.Transect
@SamVarshavchik such imagined implementation of priority queue implemented on vector would have O(n) insertion because a new highest or lowest (depending on order) element would shift all other elements. I'm pretty sure the insertion must be at most O(log N).Fugacious
I agree with user (last comment), it would be O(N) for getting the next minimum element to put it on top. If hes also right that this usually just needs O(log N) I cannot confirm. I also would doubt that if its right that the rest of the array (except the top element) is not sorted at all. I chose this answer as the answer because it was the first one of all, butt all answers are pretty good. May a further question: does a specific insertion order into priority_queue affect better performance? May I spend extra efforts for initial insertion of all elements into the queue.Cabinet
@Cabinet no, getting the next minimum (assuming this means top priority) from a sorted vector is O(1). The complexity problem is with insertion. The sorted position of a new element could be anywhere in the vector, and insert into arbitrary position of a vector is O(N).Fugacious
@Cabinet No one said the rest of items are in random order (i.e. "not sorted at all"), the order of elements in vector representation of heap is well-defined, but it is not sorted. The relation < must be preserved but not between each consecutive elements. It is less ordered, that is where the performance comes from. This answer makes at least one incorrect statement.Transect
@user2079303: thats what I saidCabinet
@luk32: "The relation < must be preserved but not between each consecutive elements." I do not understand this, may you give a little sample?Cabinet
@Cabinet E.g: en.wikipedia.org/wiki/Heap_(data_structure) "[...] there is no implied ordering between siblings or cousins and no implied sequence for an in-order traversal (as there would be in, e.g., a binary search tree). The heap relation mentioned above applies only between nodes and their parents, grandparents, etc"Transect
T
2

I think the main difference comes form two facts:

  1. Priority queue has a weaker constraint on the order of elements. It doesn't have to have sorted whole range of keys/priorities. Multimap, has to provide that. Priority queue only have to guarantee the 1st / top element to be largest.

So, while, the theoretical time complexities for the operations on both are the same O(log(size)), I would argue that erase from multimap, and rebalancing the RB-tree performs more operations, it simply has to move around more elements. (NOTE: RB-tree is not mandatory, but very often chosen as underlying container for multimap)

  1. The underlying container of priority queue is contiguous in memory (it's a vector by default).

I suspect the rebalancing is also slower, because RB-tree relies on nodes (vs contiguous memory of vector), which makes it prone to cache misses, although one has to remember that operations on heap are not done in iterative manner, it is hopping through the vector. I guess to be really sure one would have to profile it.

The above points are true for both insertions and erasues. I would say the difference is in the constant factors lost in the big-O notation. This is intuitive thinking.

Transect answered 23/1, 2017 at 14:45 Comment(0)
F
2

The abstract, high level explanation for map being slower is that it does more. It keeps the entire structure sorted at all times. This feature comes at a cost. You are not paying that cost if you use a data structure that does not keep all elements sorted.


Algorithmic explanation:

To meet the complexity requirements, a map must be implemented as a node based structure, while priority queue can be implemented as a dynamic array. The implementation of std::map is a balanced (typically red-black) tree, while std::priority_queue is a heap with std::vector as the default underlying container.

Heap insertion is usually quite fast. The average complexity of insertion into a heap is O(1), compared to O(log n) for balanced tree (worst case is the same, though). Creating a priority queue of n elements has worst case complexity of O(n) while creating a balanced tree is O(n log n). See more in depth comparison: Heap vs Binary Search Tree (BST)


Additional, implementation detail:

Arrays usually use CPU cache much more efficiently, than node based structures such as trees or lists. This is because adjacent elements of an array are adjacent in memory (high memory locality) and therefore may fit within a single cache line. Nodes of a linked structure however exist in arbitrary locations (low memory locality) in memory and usually only one or very few are within a single cache line. Modern CPUs are very very fast at calculations but memory speed is a bottle neck. This is why array based algorithms and data structures tend to be significantly faster than node based.

Fugacious answered 23/1, 2017 at 14:48 Comment(0)
Q
1

While I agree with both @eerorika and @luk32, it is worth mentioning that in the real world, when using default STL allocator, memory management cost easily out-weights a few data structure maintenance operations such as updating pointers to perform tree rotation. Depending on the implementation the memory allocation itself could involve tree maintenance operation and potentially triggers system-call where it would become even more costly.

In multi-map, there is memory allocation and deallocation associated with each insert() and erase() respectively which often contributes to slowness in a higher order of magnitude than the extra steps in the algorithm.

priority-queue however, by default uses vector which only triggers memory allocation (a much more expansive one though, which involves moving all stored objects to the new memory location) once the capacity is exhausted. In your case pretty much all allocation only happens in the first iteration for priority-queue whereas multi-map keeps paying memory management cost with each insert and erase.

The downside around memory management for map could be mitigated by using a memory-pool based custom allocator. This also gives you cache hit rate comparable to priority queue. It might even out-perform priority-queue when your object is expansive to move or copy.

Quondam answered 21/1, 2021 at 22:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.