- Language: C++
One thing I can do is allocate a vector of size n and store all data and then sort it using sort(begin(),end()). Else, I can keep putting the data in a map or set which are ordered itself so I don't have to sort afterwards. But in this case inserting an element may be more costly due to rearrangements(I guess).
So which is the optimal choice for minimal time for a wide range of n(no. of objects)
It depends on the situation.
map
and set
are usually red-black trees, they should do a lot of work to be balanced, or the operation on it will be very slow. And it doesn't support random access. so if you only want to sort one time, you shouldn't use them.
However, if you want to continue insert elements into the container and keep order, map
and set
will take O(logN)
time, while the sorted vector
is O(N)
. The latter is much slower, so if you want frequently insert and delete, you should use map
or set
.
The difference between the 2 is noticable!
Using a set, you get O(log(N))
complexity for each element you insert. So by result you get O(N log(N))
, which is the complexity of an insertion sort.
Adding everything in a vector is of complexity O(1)
, and sorting it will be O(N log(N))
since C++11 (before it, std::sort
have O(N log(N))
on average.).
Once sorted, you could use binary_search
to have the same complexity as in a set.
The API of using a vector as set ain't the friendly, although it does give nice performance benefits. This off course is only useful when you can do a bulk insert of data or when the amount of lookups is much larger than the manipulations of the content. Algorithmsable to sort on partially sorted vector, when you have to extend later on. Finally, one has to remark that you don't have the same guarantees of iterator invalidation.
So, why are vectors better? Cache locality! A vector has all data in a single memory block, hence the processor can do prefetching while for a set, the memory is scattered around the place requireing the data to find the next address. This makes vector a better set implementation than std::set for large data when you can live with the limitations.
To give you an idea, on the codebase I'm working on, we have several set and map implementations based on vectors which have their own narratives to function in. (For example: no erase or no operator[])
Rules of thumb
As always it depends, both on the average size of your container and on what you do with it.
When it comes to iteration then nothing beats contiguous memory. So the less your data changes and the more you iterate it the better the vector-based solution will perform. Even look-ups should be fast if done with binary search thanks to random access iterators. (note: std::binary_search()
only tests if a value exists, to access the value you need std::lower_bound()
or std::upper_bound()
)
However, the more often you change your data the better the associative containers in the std::lib will perform. On insertion or removal the latter need to do some work to find out how to rebalance their internal tree structure, but then that's just swapping some pointers. A vector on the other hand needs to move (or copy) on average half of its data in this situation.
If in doubt, profile!
Flat associative containers
If you tend towards the vector-based solution you could give flat associative containers a try. They offer a more natural and convenient interface as a raw vector. Most notable implementations are:
- Boost offers boost::container::flat_set and its siblings. Internally they use
std::vector
(or a similar data structure) and wrap them with most of the interface known fromstd::set
etc. For obvious reasons they do not support splicing, instead, they offer members known fromstd::vector
such asreserve()
,capacity()
,shrink_to_fit()
andmax_size()
. And there are functions to extract the internal sequence or to adopt both ordered and unordered sequences. (the former allows populating a vector, sorting it once, adopting it and then accessing it with a convenient interface) - Similarly, C++23 brings us
std::flat_set
and the like which are very similar to Boost's flat containers. A major difference is that they lackreserve()
which seems like a big oversight to me. But one can probably work around this by extracting the underlying sequence, call reserve on it and then adopt it again.
Contiguous pointers
One last contestant I'd like to throw in the ring is std::vector<std::unique_ptr<T>>
. (or, equivalently, boost::container::stable_vector
) This has less cache locality as std::vector<T>
which makes iteration not as fast (pointers are contiguous and pointees should get pre-fetched, so still better than std::set
). But if T
is rather big then insertion and deletion become faster as only pointers need to be copied instead of having to move entire instances around.
© 2022 - 2024 — McMap. All rights reserved.
std::set
orstd::map
, there is a balanced tree, which is sorted all the time - each insertion/deletion costsO(logn)
which means the total insertions will costO(nlogn)
. While in a vector, each insertion will cost youO(1)
(in average, because it sometimes must be duplicated), while each deletion will cost youO(n)
and sorting it will cost youO(nlogn)
each time. If you need it sorted all the time, I'd say clearly you should use a tree. Otherwise, it may be just the same as using a vector and sorting it in the end – Coagulant