Can an unordered_set use a different allocator for the nodes and the bucket list?
Asked Answered
T

2

13

I'd like to use a std::pmr::unordered_map with a std::pmr::monotonic_buffer_resource. The two fit well together, because the set's nodes are stable, so I don't create a lot of holes in the buffer resource by reallocation:

 std::pmr::monotonic_buffer_resource res;
 std::pmr::unordered_set<T> set(&res);

That is, except for the bucket list, which needs to be reallocated when the set rehashes as it exceeds the max_load_factor(). Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown, what are my options?

If I know that unordered_set is implemented as std::vector<std::forward_list>, as in (some versions of) MSVC, then I should be able to use a scoped_allocator to give different allocators for the vector and the forward_list. But a) I can't rely on unordered_set being a vector<forward_list> in portable code and b) scoped_allocator is an Allocator whereas monotonic_buffer_resource is a memory_resource, an impedance mismatch that will make for very complicated initialization.

Or I could write a switch_memory_resource that delegates to other memory_resources based on the size of the request. I could then use a monotonic_buffer_resource for requests that match the size of the nodes (which, however, I cannot, portably, know, either) and default_memory_resource() for everything else. I could probably make an educated guess that the nodes are at most sizeof(struct {void* next; size_t hash; T value;}), add some error margin by multiplying that by two and use that as the cut-off between the two memory_resources, but I wonder whether there's a cleaner way?

Tresa answered 17/11, 2020 at 14:22 Comment(0)
P
8

The small number of concrete resource types that I proposed a number of years ago and that were adopted into C++17 was a minimalist set of useful allocators. As evidenced by your question, they do not provide optimal behavior for every circumstance. There are not many tuning dials and I have some regrets about missing functionality, but they are still useful for most cases.

For your specific situation, you say "Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown." I'm not sure any general allocator can help you. The geometric growth of the bucket list will leave holes in any allocation strategy. The question is whether those holes can be re-used and/or minimized. As you point out, only a very-carefully customized allocator for the very specific situation will minimize these holes. But maybe your assumptions are too strong.

Consider a std::pmr::vector<int>. This is the worst-case scenario for a monotonic_buffer_resource because every reallocation results in leaked memory. And yet, even this case has a worst-case memory waste of only 50%; i.e., it will never use more than twice as much memory as it would with a resource that perfectly reuses memory blocks. Granted, 50% can be pretty bad, but in your scenario, we are talking much, much less. For a reasonably large set, the bucket list is small compared to the buckets and the data itself, and you can use reserve to minimize reallocation. So my first piece of advice is to go ahead and use the monotonic_buffer_resource without alteration, and measure to see if you have unacceptable memory use. A second experiment would be to use an unsynchronized_pool_resource backed by an (upstream) monotonic_buffer_resource.

If you decide you want to create a custom resource for this purpose, which might be fruitful and might even be fun, your approach of choosing some lower threshold for passing to the monotonic allocator would probably work and would not actually be a lot of effort. You could also consider making it adaptive: Keep a list of the last, say, 4, allocation sizes. If any size gets more than two hits, then assume it is your node size and allocate those nodes from the monotonic resource while other requests get passed directly to the upstream resource. Be careful, however, that you use such a custom allocator only when you know what's going on. If you have a std::pmr::unordered_set<std::pmr::string>, then both approaches may result in many of the strings getting allocated from the upstream resource and thus losing the benefit of the monotonic buffer. As you can see, being too stingy with memory for your bucket list can backfire in a generic context. You're likely to discover that the unmodified monotonic_buffer_resource was a better bet.

Good luck, and please report your findings back here. Also, if you have an idea for a general-purpose resource that could address your problem (or any other common allocation problem), I'd love to hear about it. There's certainly room in the standard for a few more useful resource types.

Publicity answered 18/11, 2020 at 17:39 Comment(2)
Excellent point about pmr::unordered_set<pmr::string>. Which just goes to prove that any switching on the request size is going to cause failures. I think what I was hoping for was some form of scoped-allocator support, so you'd know that at the top level, requests would only be for the bucket list, on the next for the nodes and on the third and following for the value_types. Something like monotonic_buffer_resource mbr; scoped_memory_resource{default_memory_resource(), &mbr, &mbr}; where the last &mbr is superfluous, like in scoped_allocator_adapter.Tresa
I understand how fine control over the allocation in deeper scopes would be useful. One of the balancing acts of any generic library is giving the programmer enough latitude to do what they want while preserving usability for the largest possible audience. In your case, you have dug into the implementation of your specific standard library, but your solution would not work for other implementations. One team is looking into getting more predictable allocator performance by giving the allocator a hint about how memory use will grow; the vector pattern would be one of those hints.Publicity
E
0

The definition of std::pmr::unordered_set

using unordered_set = std::unordered_set<Key, Hash, Pred,
                                         std::pmr::polymorphic_allocator<Key>>

Which indicates that it is only used for the nodes.

But a short test shows it is used for both, which makes me sad. So the only way will be to test on the size of the allocation.

this part indicates that allocator only considers Key

std::pmr::polymorphic_allocator<**Key**>

In the old allocators there was a rebind functions to convert to another allocator type that I didn't see in the pmr (it is in std::allocator_traits, see comments).

I was starting to write something like this

  void* do_allocate(std::size_t bytes, std::size_t alignment) override {
    if (bytes == first_size)
      return first_alloc->allocate(bytes, alignment);
    return second_alloc->allocate(bytes, alignment);
  }

When I saw that there already exists an pre-pmr scoped_allocator_adaptor I don't know if its compatible with pmr, that will have to be investigated further.

Evante answered 17/11, 2020 at 20:27 Comment(8)
Could you point out where the definition indicates that it is used only for the nodes? If it does, it's either a bug in the definition or a bug in the cppreference.com write-up.Publicity
from en.cppreference.com/w/cpp/container/unordered_set/unordered_set : "alloc - allocator to use for all memory allocations of this container"Barbey
@bolov. Right, but Surt said "only used for the nodes", which seems to contradict "all". I'm trying to understand where the disconnect lies.Publicity
@PabloHalpern I am contradicting SurtBarbey
@PabloHalpern, the point where the allocator mentions Key and nothing else.Evante
@Surt. Oh, you mean std::allocator<Key>. That is the unfortunate legacy of the way allocators were specified in C++98. In reality, the allocator is never used to allocate the key because the key is part of a larger node. The traditional way that allocators were specified in C++98 was that they were instantiated on the value_type even if never used for that. Not an error in the description, but an unfortunate source of confusion.Publicity
@PabloHalpern exactly, so the type there means nothing, which is slightly confusing.Evante
@Surt, the reason you don't see rebind in std::pmr::polymorphic_allocator is because it exists generically in std::allocator_traits. Instead of saying allocator_type::rebind<Node>::other, a library should always say std::allocator_traits<allocator_type>::rebind_alloc<Node>. See sections [allocator.requirements] and [allocator.traits] in the standard.Publicity

© 2022 - 2024 — McMap. All rights reserved.