Usecases for std::unordered_multiset
Asked Answered
A

1

8

I'd like to know why one would ever use std::unordered_multiset. My guess is it has something to do with invalidations or non-invalidations of iterators after insert/erase, but maybe its something deeper? Very similar question is here: Use cases of std::multimap, but it's more of a discussion about maps.

Acreage answered 18/10, 2015 at 1:3 Comment(0)
S
8

With regards to your question, the most important features of unordered_multiset containers are that:

  • They are associative containers, so they allow for fast data lookup and retrieval as compared to (unsorted) vectors.
  • They are typically much faster than multisets, both for insertion and for lookup, and sometimes even for deletion (see e.g. this benchmark).

Thus, a typical use-case for unordered_multiset would be when you need fast lookup and don't care that the data are unordered:

  • If you don't do any data lookup at all, a vector probably is a better solution;
  • If you need the data to be sorted, a multiset should probably be used.

Note that there are other situations where unordered containers cannot or should not be used.

  • Firstly, if hashing is very costly, an unordered container may actually become slower than an ordered one would be.
  • Secondly, the worst-case insertion complexity for an unordered container is linear, rather than logarithmic as is the case for an ordered container. In practice, this means that insertion is almost always very fast, except from time to time when the container size exceeds some capacity threshold and that the whole container needs to be rehashed. As a consequence, it may be impossible to use an unordered container if you have severe timing requirements on insertion time or if rehashing is very slow.
  • Thirdly, there may be cases when memory consumption is unacceptably high for the unordered_multiset (but it can sometimes be solved by fine-tuning the container parameters).

Regarding the uses cases where ordered containers should be prefered to unordered ones, you may want to read this answer. For general guidelines regarding container selection, you may want to read How can I efficiently select a Standard Library container in C++11?.

EDIT

Considering that an unordered multiset and a vector can often do very similar things, isn't it better to always use a vector? Don't vectors automatically outperform unordered multisets?

Reproduced below are the results of a very simple benchmark (full code is provided at the end of this post):

  • We create a container which may be either an unordered multiset, a raw vector, or a sorted vector;
  • We alternately insert some random element and then count some random key;
  • The insert+counting operation is repeated 100 000 times;
  • The total duration needed for these 100 000 insert+counting operations is measured and displayed.

Here are the results obtained for containers of integers :

Environment
Compiler
Windows 7
VS Express 2013
CygWin 64 bits
gcc 4.9.3
unordered_multiset<int> 0.75 s 0.8 s
vector<int>, unsorted 7.9 s 11.0 s
vector<int>, sorted 1.0 s 0.6 s

In the example above, the unordered multiset is slightly better for the Windows benchmark, whereas the sorted vector is slightly better for the CygWin benchmark. For a multi-target development, there is no obvious choice here.

Below are the results of a similar test with containers of strings:

Environment
Compiler
Windows 7
VS Express 2013
CygWin 64 bits
gcc 4.9.3
unordered_multiset<string> 1 s 1 s
vector<string>, unsorted 30 s 65 s
vector<string>, sorted 130 s 32 s

In this example, the unordered multisets outperforms by far the vectors.

The exact figures don't matter much here, as they are specific to the specific conditions (hardware, OS, compiler, compiler options and so on) where these benchmarks were performed. What matters is that vectors sometimes outperform unordered multisets, but sometimes they don't. The only way to be sure whether unordered multisets or vectors should be used for a given application is to benchmark as realistically as feasible.

Below is included the code for the integer container benchmark. As it was developed on the fly, all corrections and improvements are welcome!

#include "stdafx.h"
#include <iostream>
#include <array>
#include <unordered_set>
#include <vector>
#include <algorithm>
#include <chrono>
#include <cstdlib>
#include <unordered_map>
#include <string>

using namespace std;

const unsigned N = 100000;      // Number of test iterations (= insertions + lookups)
typedef string Element;         // Type of data stored into the container to be tested
array<Element, N> testData;     // Pseudo-random input sequence
array<Element, N> lookupKeys;   // Pseudo-random lookup keys

// Test action for an unordered_multiset (insert some random data then count some random key)
struct unordered_multiset_action
{
    typedef unordered_multiset<Element> Container;
    int operator()(Container& container, unsigned k)
    {
        container.insert(testData[k]);
        return container.count(lookupKeys[k]);
    }
};

// Test action for an unsorted vector (insert some random data then count some random key)
struct unsorted_vector_action
{
    typedef vector<Element> Container;
    int operator()(Container& container, unsigned k)
    {
        container.push_back(testData[k]);               
        return count(testData.cbegin(), testData.cend(), lookupKeys[k]);
    }
};

// Test action for a sorted vector (insert some random data then count some random key)
struct sorted_vector_action
{
    typedef vector<Element> Container;
    int operator()(Container& container, unsigned k)
    {
        container.insert(upper_bound(container.begin(), container.end(), testData[k]), testData[k]);
        auto range = equal_range(container.cbegin(), container.cend(), lookupKeys[k]);
        return range.second - range.first;
    }
};

// Builds an empty container to be tested
// Then invokes N times the test action (insert some random key then count some other random key)
template<class Action>
long long container_test(Action action)
{
    using Container = typename Action::Container;
    Container container;
    long long keyCount = 0;
    for (unsigned k = 0; k<N; ++k)
        keyCount += action(container, k);
    return keyCount;
}

int main(int nargs, char *args[])
{
    using namespace chrono;

    // Parse user input to select which container should be tested
    enum SelectedContainer { UNORDERED_MULTISET, UNSORTED_VECTOR, SORTED_VECTOR };
    unordered_map<string, SelectedContainer> decoder{ { "unordered_multiset", UNORDERED_MULTISET },
                                                      { "unsorted_vector",    UNSORTED_VECTOR },
                                                      { "sorted_vector",      SORTED_VECTOR } };
    if ( nargs < 2 )
    {
        cerr << "Please provde an argument among those keywords: unordered_multiset, unsorted_vector, sorted_vector" << endl;
        return (-1);
    }
    auto it = decoder.find(args[1]);
    if (it == decoder.cend())
    {
        cerr << "Please enter one of the following arguments: unordered_multiset, unsorted_vector, sorted_vector" << endl;
        return (-1);
    }
    SelectedContainer selectedContainer = it->second;

    // Generate pseudo-random input data and input keys (no seeding for reproducibility)
    generate(testData.begin(),   testData.end(), []()   { return rand() % 256; });
    generate(lookupKeys.begin(), lookupKeys.end(), []() { return rand() % 256; });

    // Run test on container to be tested and measure elapsed time
    auto startTime = high_resolution_clock::now();
    long long keyCount;
    switch (selectedContainer)
    {
    case UNORDERED_MULTISET: 
        keyCount = container_test(unordered_multiset_action());
        break;
    case UNSORTED_VECTOR:
        keyCount = container_test(unsorted_vector_action());
        break;
    case SORTED_VECTOR:
        keyCount = container_test(sorted_vector_action());
        break;
    };
    
    auto endTime = high_resolution_clock::now();

    // Summarize test results
    duration<float> elaspedTime = endTime - startTime;
    cout << "Performed " << N << " insertions interleaved with " << N << " data lookups" << endl;
    cout << "Total key count = " << keyCount << endl;
    cout << "Elapsed time: " << duration_cast<milliseconds>(elaspedTime).count() << " milliseconds" << endl;
}
Salaam answered 26/10, 2015 at 12:49 Comment(3)
Great analysis, but it does seem a vector will always be a better option than an unordered_multiset. No?Acreage
@Johann I've edited the post to include a benchmark for a possible use-case. As you may see, vectors are in some conditions outperformed by unordered multisets, whereas in other conditions it is the other way around. Which one is better is very hard to know in advance, so benchmarking is always the safest approach.Salaam
Multiset supports insertion for duplicates, unlike set. Now my question is if we only need to maintain the count of duplicates isn't that the classical usecase of map where a key is mapped to a value, key being item and value being it's count in this case. Where should one use Multiset instead of Map or vice-versa ?Pietra

© 2022 - 2024 — McMap. All rights reserved.