I'll assume you build up your map once and then access it repeatedly, since you mention using a sorted array which is effective in that scenario.
Since your key density is 0.1 (50k values in 500k domain) you could represent the key set with a 500k-entry bitmap which would add only 1.25 bytes per entry or about a 10% overhead (500k bits versus 50k * 12 bytes of payload).
The problem is to access a given entry by key is then a linear operation. So you have another array key_to_slot
which allows you to map from an key i
to a slot in the values array. This array is sparse, however, so it only has entries for say every 64th possible key in the domain.
This adds another 0.625 bytes per entry, so total overhead is still < 2 bytes per entry.
So something like this:
template <typename Value>
struct bitmap_map {
using Key = int;
// key_bitset[i] is true iff key i is in the map
bitset key_bitset;
std::array<Value, 50000> values;
std::array<int, 500000 / 64> key_to_slot;
bool contains(Key i) {
return key_bitset[i];
}
Value at(Key i) {
int base_idx = key_to_slot(i / 64);
int adjusted_index = base_idx + key_bitset.aligned_popcnt(i)
return values[adjusted_index];
}
};
bitset::aligned_popcnt(i)
is the number of set bits between index i / 64 * 64
(i.e., i
aligned down to a 64 boundary) and i
, and is basically 1 memory access and one popcnt
instruction.
If you want to dynamically mutate the map in-between lookups things get more complicated since additions or removals to the map would require linear work (shifting up to all values and updating the key_to_slot
array. One way around that is to have each 64 slice of the domain just have a separately allocated array, pointed to by key_to_slot
, so the amount of shifting required is small and fixed and key_to_slot
never needs linear fixups. This bears some resemblance to std::deque
.
std::map
? – Maliamalicestd::unordered_map
for this case. I'm more curious about whether there was any tuning of the bucket size. – Fencestd::map
should consume less memory thanstd::unordered_map
but I've not done any tests myself. I would think complexity would be similar to a sorted array but without the need to manually implement it. Though the array should be faster. – Maliamaliceint -> std::pair<int, int>
mapping (for a total of 10M=200*50,000 elements, especially when using an std::unordered_map), while the title suggests that the key is an integer. – Gonsalez