Is std::unordered_set contiguous (like std::vector)?
Asked Answered
N

4

9

I'm storing pointers in std::unordered_set. I do this because I don't want any duplicates (I delete the pointers in the collection, so if there is a duplicate, I will attempt to delete an already deleted pointer). I loop heavily through these sets, and since I know std::vector is the fastest container for looping (contiguous memory), I wondered if std::unordered_set does the same.

If it doesn't, would using a std::vector and checking if the pointer has already been deleted be faster?

Nasturtium answered 17/1, 2013 at 17:25 Comment(8)
If you want to know, you should benchmark it.Vannesavanness
Is it your bottleneck, really? Did you profile?Linell
You manually delete the pointers? i.e. you store raw pointers inside that container? If the container is your bottleneck, you are accessing it often, and if you juggle with those raw pointers that much I you will run into memory management issues which are much more than the performance penalties of unordered_set vs. vector in your scenario. Try to use smart pointers. Maybe unique_ptr, you will have to write pure evil code to get duplicates of those ;)Profligate
@LightnessRacesinOrbit I tried profiling, but it wasn't clear what the bottleneck was. The profiler gave me hard-to-read symbols and assembly code. Do you have a profiler to suggest?Nasturtium
You're literally the template-source for std::unordered_set<> away from answering this yourself, and once cracked open you'll find no chance of a contiguous implementation (and what would it matter if it was? the most expensive part of a decent hash-table implementation is the hash function itself (degenerate linked-list via unbounded collisions not withstanding)).Azedarach
Reasonable unordered_set implementation will use 'empty' hash function for the pointer type. Also this is very easy to implement.Expert
@9dan, a unordered_set implementation will use whatever hash function the user tells it to use, empty or not!Deventer
@WhozCraig: Why would the hashfunction be a bottleneck when looping through the collection (which shouldn't even need hashing)?. @Vee: Are you 100% sure you need to use raw pointers instead of unique_ptr or shared_ptr? If you aren't I would really suggest using smartpointers.Natalianatalie
J
19

Is std::unordered_set contiguous ?

The exact implementation of containers is not detailed by the standard... however the standard does prescribes a number of behaviors which constrains the actual representation.

For example, std::unordered_set is required to be memory stable: a reference to/address of an element is valid even when adding/removing other elements.

The only way to achieve this is by allocating elements more or less independently. It cannot be achieved with a contiguous memory allocation as such an allocation would necessarily be bounded, and thus could be overgrown with no possibility of re-allocating the elements in a bigger chunk.

Jannajannel answered 17/1, 2013 at 18:5 Comment(8)
Existing iterators are invalidated when inserting or removing other elements in an unordered_set if it causes a rehash. ? Are you sure your statement about "memory stable" is correct?Jehiah
@AndrewTomazosFathomlingCorps, but rehashing specifically doesn't invalidate pointers or references. Iterators don't necessarily refer directly to memory, pointers do. Elements do not move in memory on a rehash, the connections between them (which iterators traverse) might get rejiggled, but the elements do not move.Deventer
@JonathanWakely: In that case the buckets must just be the head pointers of linked lists (a linked hash). I think this means that every insert causes a dynamic memory allocation. I think it would be more efficient to have the buckets first element placed in aligned_storage. Of course this would mean you have to move construct on rehash rather than just copy a pointer, but this is the same "problem" as std::vector.Jehiah
@AndrewTomazosFathomlingCorps: I believe this requirement of stability was to mimick more closely what happens in a std::set; I fully agree however that it's annoying and severely constrain the possible designs. That being said, thanks to allocators you still have the possibility of lessening the impact of the allocation: you could provide a bump allocator for example.Jannajannel
@MatthieuM.: Whats a "bump allocator" ? The elements have to be allocated and deallocated in relative random order, so the problem of doing so is the same as any dynamic allocation scheme.Jehiah
Also if we are doing the full single linked list anyway from the bucket head than why not make it a doubly linked (tail points back to bucket head), that way iterators would not have to be invalidated on a rehash.Jehiah
@AndrewTomazosFathomlingCorps: 1) A "bump" allocator does not concern itself (usually) with freeing. It just appends the new objects as it go. It's extremely efficient for short-lived collections or collections with little to no removal. 2) Indeed, this could be possible; I suspect the list is already doubly-linked (since Bidirectional Iteration is supported). Functionally though it's a bit dubious, because by incrementing the iterator you could pass over elements you already visited which is in violation to the FowardIterator contract: on iteration each element should be visited once.Jannajannel
@MatthieuM.: (re 2) True, however random reordering of iterators would still be better than UB. There are a number of algorithms that keep looping over a collection and modifying it until some equilibrium is reached (eg a closure operation), these would still be correct even with random reordering on modification. (Aside: rehashing can be blocked to keep iterators valid https://mcmap.net/q/464925/-simultaneously-iterating-over-and-modifying-an-unordered_set/1131467)Jehiah
S
4

No it is not contiguous memory, but it's still really fast, thanks to a hash map.

Edit: fast for random access, if you mainly do loops, you should consider another container, I think.

Edit2: And you should profile so as to know if it's worth thinking about another container. ( Maybe you should optimize somewhere else... maybe).

Sheasheaf answered 17/1, 2013 at 17:27 Comment(0)
N
4

The fact that the following member functions are offered by std::unordered_map suggests that it is based on a hashed-table, perhaps separate chaining with linked lists.

bucket_count, hash_function, load_factor, max_load_count, rehash

Whether the elements are contiguous or not depends on the allocator. The default allocator for the unordered_map and list does not allocate the elements in contiguous memory. The memory for each element is allocated at the time of its insertion.

However, you can provide a custom allocator (such as a pool allocator) which may allocate the elements from a pre-allocated memory pool. Still, the logically adjacent elements in the data structure may not be physically adjacent in the memory.

So, if looping through all the elements is the most frequent operation, then the unordered_map may not be best solution. Running the dominant use cases through a profiler for all competing solutions would reveal the best solution.

In addition to that, unordered_map is not the best choice to loop for another reason. Note the word "unordered" in the name, it conveys that -- unlike list, vector, or map -- there is no order of the elements. For example, the member function rehash may change the relative order of the elements. In fact, rehashes are automatically performed by the container whenever its load factor is going to exceed the max_load_factor during any operation.

Nanette answered 17/1, 2013 at 18:35 Comment(0)
E
1

std::unordered_set is supposed to be a hash map container, so we could assume it has a little performance penalty when comparing with std::vector.

But I think you must check out the actual profiling result if the unordered_set access is real hotspot.

If the STL implementation you are using is reasonable one, it should provide vector like specialization for the pointer or int type key. If it's true, the unordered_set specialized for the pointer type will behave much like the automatically growing/shrinking vector and performance difference will be unnoticeable.

Expert answered 17/1, 2013 at 17:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.