How can I efficiently select a Standard Library container in C++11?
Asked Answered
P

4

152

There's a well known image (cheat sheet) called "C++ Container choice". It's a flow chart to choose the best container for the wanted usage.

Does anybody know if there's already a C++11 version of it?

This is the previous one: eC++ Container choice

Pieria answered 22/5, 2012 at 9:26 Comment(10)
Never seen this before. thanks!Gilolo
@WeaselFox: It's already a part of the C++-Faq here on SO.Frumentaceous
C++11 only introduced one new true container type: the unordered_X containers. Including them would only muddle the table considerably, as there are a number of considerations when deciding whether a hash-table is appropriate.Armendariz
@NicolBolas: Also forward_list and array. And the unordered containers wouldn't really muddle it; they'd just add a new cluster linked from "Need to find element by key".Jueta
@MikeSeymour: The unordered containers would be a very large cluster; that's why it would muddle the table. Deciding whether to use a hashtable or not is not a simple couple of decisions.Armendariz
@NicolBolas But that's already true for the older containers. I often use an std::vector for ordered containers, keeping the data sorted myself.Spoke
James is right, there are more cases to use vector than what the table shows. The advantage of data locality outperforms in many cases the lack of efficiency in some operations (more soon C++11). I don't find e chart so useful even for c++03Lactoscope
This is cute, but I think reading any common textbook on data structures will leave you in a state whereby you could not only reinvent this flowchart in a few minutes, but also know a lot more useful stuff that this flowchart glosses over.Crayon
Is there any updated version of that flowchart anyway? It is easy and very good planner.Matron
"Order is important?" -> Sort your vector. "Store key seperate to value?" -> Use vector< tuple<key, value>> and ordered by key.Weis
D
106

Not that I know of, however it can be done textually I guess. Also, the chart is slightly off, because list is not such a good container in general, and neither is forward_list. Both lists are very specialized containers for niche applications.

To build such a chart, you just need two simple guidelines:

  • Choose for semantics first
  • When several choices are available, go for the simplest

Worrying about performance is usually useless at first. The big O considerations only really kick in when you start handling a few thousands (or more) of items.

There are two big categories of containers:

  • Associative containers: they have a find operation
  • Simple Sequence containers

and then you can build several adapters on top of them: stack, queue, priority_queue. I will leave the adapters out here, they are sufficiently specialized to be recognizable.


Question 1: Associative ?

  • If you need to easily search by one key, then you need an associative container
  • If you need to have the elements sorted, then you need an ordered associative container
  • Otherwise, jump to the question 2.

Question 1.1: Ordered ?

  • If you do not need a specific order, use an unordered_ container, otherwise use its traditional ordered counterpart.

Question 1.2: Separate Key ?

  • If the key is separate from the value, use a map, otherwise use a set

Question 1.3: Duplicates ?

  • If you want to keep duplicates, use a multi, otherwise do not.

Example:

Suppose that I have several persons with a unique ID associated to them, and I would like to retrieve a person data from its ID as simply as possible.

  1. I want a find function, thus an associative container

1.1. I couldn't care less about order, thus an unordered_ container

1.2. My key (ID) is separate from the value it is associated with, thus a map

1.3. The ID is unique, thus no duplicate should creep in.

The final answer is: std::unordered_map<ID, PersonData>.


Question 2: Memory stable ?

  • If the elements should be stable in memory (ie, they should not move around when the container itself is modified), then use some list
  • Otherwise, jump to question 3.

Question 2.1: Which ?

  • Settle for a list; a forward_list is only useful for lesser memory footprint.

Question 3: Dynamically sized ?

  • If the container has a known size (at compilation time), and this size will not be altered during the course of the program, and the elements are default constructible or you can provide a full initialization list (using the { ... } syntax), then use an array. It replaces the traditional C-array, but with convenient functions.
  • Otherwise, jump to question 4.

Question 4: Double-ended ?

  • If you wish to be able to remove items from both the front and back, then use a deque, otherwise use a vector.

You will note that, by default, unless you need an associative container, your choice will be a vector. It turns out it is also Sutter and Stroustrup's recommendation.

Downatheel answered 22/5, 2012 at 11:26 Comment(5)
+1, but with some notes: 1) array doesn't require a default constructible type; 2) picking the multis is not so much about duplicates being allowed but more about whether keeping them matters (you can put duplicates in non-multi containers, it just happens that only one is kept).Ricks
The example is a bit off. 1) we can "find" (not the member function, the "<algorithm>" one) on a non associative container, 1.1) if we need to find "efficiently", and unordered_ will be O(1) and not O(log n).Pieria
@BlakBat: map.find(key) is much more palatable than std::find(map.begin(), map.end(), [&](decltype(map.front()) p) { return p.first < key; })); though, therefore it is important, semantically, that find is a member function rather than the one from <algorithm>. As for O(1) vs O(log n), it does not affect semantics; I'll remove the "efficiently" from the example and replace it with "easily".Downatheel
"If the elements should be stable in memory ... then use some list" ... hmmm, I thought deque had this property too?Bismuthinite
@MartinBa: Yes and no. In a deque the elements are stable only if you push/pop at either end; if you start inserting/erasing in the middle then up to N/2 elements are shuffled to fill the gap created.Downatheel
A
57

I like Matthieu's answer, but I'm going to restate the flowchart as this:

When to NOT use std::vector

By default, if you need a container of stuff, use std::vector. Thus, every other container is only justified by providing some functionality alternative to std::vector.

Constructors

std::vector requires that its contents are move-constructible, since it needs to be able to shuffle the items around. This is not a terrible burden to place on the contents (note that default constructors are not required, thanks to emplace and so forth). However, most of the other containers don't require any particular constructor (again, thanks to emplace). So if you have an object where you absolutely cannot implement a move constructor, then you will have to pick something else.

A std::deque would be the general replacement, having many of the properties of std::vector, but you can only insert at either ends of the deque. Inserts in the middle require moving. A std::list places no requirement on its contents.

Needs Bools

std::vector<bool> is... not. Well, it is standard. But it's not a vector in the usual sense, as operations that std::vector normally allows are forbidden. And it most certainly does not contain bools.

Therefore, if you need real vector behavior from a container of bools, you're not going to get it from std::vector<bool>. So you'll have to make due with a std::deque<bool>.

Searching

If you need to find elements in a container, and the search tag can't just be an index, then you may need to abandon std::vector in favor of set and map. Note the key word "may"; a sorted std::vector is sometimes a reasonable alternative. Or Boost.Container's flat_set/map, which implements a sorted std::vector.

There are now four variations of these, each with their own needs.

  • Use a map when the search tag is not the same thing as the item you're looking for itself. Otherwise use a set.
  • Use unordered when you have a lot of items in the container and search performance absolutely needs to be O(1), rather than O(logn).
  • Use multi if you need multiple items to have the same search tag.

Ordering

If you need a container of items to always be sorted based on a particular comparison operation, you can use a set. Or a multi_set if you need multiple items to have the same value.

Or you can use a sorted std::vector, but you'll have to keep it sorted.

Stability

When iterators and references are invalidated is sometimes a concern. If you need a list of items, such that you have iterators/pointers to those items in various other places, then std::vector's approach to invalidation may not be appropriate. Any insertion operation may cause invalidation, depending on the current size and capacity.

std::list offers a firm guarantee: an iterator and its associated references/pointers are only invalidated when the item itself is removed from the container. std::forward_list is there if memory is a serious concern.

If that's too strong a guarantee, std::deque offers a weaker but useful guarantee. Invalidation results from insertions in the middle, but insertions at the head or tail causes only invalidation of iterators, not pointers/references to items in the container.

Insertion Performance

std::vector only provides cheap insertion at the end (and even then, it becomes expensive if you blow capacity).

std::list is expensive in terms of performance (each newly inserted item costs a memory allocation), but it is consistent. It also offers the occasionally indispensable ability to shuffle items around for virtually no performance cost, as well as to trade items with other std::list containers of the same type at no loss of performance. If you need to shuffle things around a lot, use std::list.

std::deque provides constant-time insertion/removal at the head and tail, but insertion in the middle can be fairly expensive. So if you need to add/remove things from the front as well as the back, std::deque might be what you need.

It should be noted that, thanks to move semantics, std::vector insertion performance may not be as bad as it used to be. Some implementations implemented a form of move semantic-based item copying (the so-called "swaptimization"), but now that moving is part of the language, it's mandated by the standard.

No Dynamic Allocations

std::array is a fine container if you want the fewest possible dynamic allocations. It's just a wrapper around a C-array; this means that its size must be known at compile-time. If you can live with that, then use std::array.

That being said, using std::vector and reserveing a size would work just as well for a bounded std::vector. This way, the actual size can vary, and you only get one memory allocation (unless you blow the capacity).

Armendariz answered 23/5, 2012 at 7:35 Comment(8)
Well, I like your answer a lot too :) WRT keeping a vector sorted, apart from std::sort, there is also std::inplace_merge which is interesting to easily place new elements (rather than a std::lower_bound + std::vector::insert call). Nice to learn about flat_set and flat_map!Downatheel
You also cannot use a vector with 16-byte aligned types. Also also a good replacement for vector<bool> is vector<char>.Personalize
@Inverse: "You also cannot use a vector with 16-byte aligned types." Says who? If std::allocator<T> doesn't support that alignment (and I don't know why it wouldn't), then you can always use your own custom allocator.Armendariz
@NicolBolas: because std::vector::resize takes its type by value, see https://mcmap.net/q/48636/-error-c2719-39-_val-39-formal-parameter-with-__declspec-align-39-16-39-won-39-t-be-alignedPersonalize
@Inverse: C++11's std::vector::resize has an overload that does not take a value (it just takes the new size; any new elements will be default-constructed in-place). Also, why are compilers unable to properly align value parameters, even when they're declared to have that alignment?Armendariz
@NicolBolas: Beats me! I'm glad c++11 fixed the interface thoughPersonalize
bitset for bool if you know the size in advance en.cppreference.com/w/cpp/utility/bitsetTravistravus
@Travistravus bitset does nothing to address this post's warnings about vector<bool> given that it has the same issues: it does not really contain a set of bool instances, rather packed bits accessed via proxy class instances returned by value. In the context of this question, its only strength over vector<bool> is that it isn't in possession of a hopelessly misleading name and endless scorn from everyone involved in the language. Btw, rather than giving up and using a deque to get a real container of bools, we can quite easily create a thin wrapper struct around a bool instanceFrustrated
C
31

Here is the C++11 version of the above flowchart. [originally posted without attribution to its original author, Mikael Persson]

Corie answered 22/6, 2014 at 1:47 Comment(1)
@NO_NAME Wow, I'm glad somebody bothered to cite a source.Frustrated
M
1

Here's a quick spin, although it probably needs work

Should the container let you manage the order of the elements?
Yes:
  Will the container contain always exactly the same number of elements? 
  Yes:
    Does the container need a fast move operator?
    Yes: std::vector
    No: std::array
  No:
    Do you absolutely need stable iterators? (be certain!)
    Yes: boost::stable_vector (as a last case fallback, std::list)
    No: 
      Do inserts happen only at the ends?
      Yes: std::deque
      No: std::vector
No: 
  Are keys associated with Values?
  Yes:
    Do the keys need to be sorted?
    Yes: 
      Are there more than one value per key?
      Yes: boost::flat_map (as a last case fallback, std::map)
      No: boost::flat_multimap (as a last case fallback, std::map)
    No:
      Are there more than one value per key?
      Yes: std::unordered_multimap
      No: std::unordered_map
  No:
    Are elements read then removed in a certain order?
    Yes:
      Order is:
      Ordered by element: std::priority_queue
      First in First out: std::queue
      First in Last out: std::stack
      Other: Custom based on std::vector????? 
    No:
      Should the elements be sorted by value?
      Yes: boost::flat_set
      No: std::vector

You may notice that this differs wildly from the C++03 version, primarily due to the fact that I really do not like linked nodes. The linked node containers can usually be beat in performance by a non-linked container, except in a few rare situations. If you don't know what those situations are, and have access to boost, don't use linked node containers. (std::list, std::slist, std::map, std::multimap, std::set, std::multiset). This list focuses mostly on small and middle sided containers, because (A) that's 99.99% of what we deal with in code, and (B) Large numbers of elements need custom algorithms, not different containers.

Misdoubt answered 16/5, 2014 at 16:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.