C++: Scott Meyers "Effective STL": item 31: know your sorting options: help to understand
Asked Answered
P

5

9

Good day!

In his "Effective STL" Scott Meyers wrote

A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in. As you can see, there are lots of options. (Item 31, second part)

Can someone explain me this way?


More text (to understand the context):

The algorithms sort, stable_sort, partial_sort, and nth_element require random access iterators, so they may be applied only to vectors, strings, deques, and arrays. It makes no sense to sort elements in standard associative containers, because such containers use their comparison functions to remain sorted at all times. The only container where we might like to use sort, stable_sort, partial_sort, or nth_element, but can't, is list, and list compensates somewhat by offering its sort member function. (Interestingly, list::sort performs a stable sort.) If you want to sort a list, then, you can, but if you want to use partial_sort, or nth_element on the objects in a list, you have to do it indirectly. One indirect approach is to copy the elements into a container with random access iterators, then apply the desired algorithm to that. Another is to create a container of list::iterators, use the algorithm on that container, then access the list elements via the iterators. A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in. As you can see, there are lots of options.

Partan answered 9/2, 2012 at 23:47 Comment(6)
Am I the only one realizing that this would have terrible cache behavior ? A list is already generally not recommend because of it is node based (ie, elements are scattered to the wind), and here we introduce another node based container to refer to the list items (via a pointer-like object). Pfiuuu.Bice
@MatthieuM. "list elements are scattered to the wind". To avoid lots of memory cashing and associated page faults you may use placement new form to create list nodes in continuous memory block.Partan
@MatthieuM. What do we introduce. ClarifyPartan
you introduce an ordered container of iterators seem to make people lean toward a set which is itself node based.Bice
@MatthieuM. got your idea, thanks. +1 Is set always node-based (on all widely used implementations)??Partan
Yes. It is mandatory because both set, list and map have a requirement that iterators and references must not be invalidated on insert/erase (apart from erasing the element actually pointed to). This can be compounded, a bit, by providing an allocator that packs the nodes together, but it still means a lot of pointers and forth-and-back traversals.Bice
N
3

I'm not sure what the confusion is but I suspect that it is what "splicing" refers to: the std::list<T> has an splice() member function (well, actually several overloads) which transfer nodes between lists. That is, you create a std::vector<std::list<T>::const_iterator> and apply the sorting algorithm (e.g. std::partial_sort()) to this. Then you create a new std::list<T> and use the splice() member with the iterators from the sorted vector to put the nodes into their correct order without moving the objects themselves.

This would look something like this:

std::vector<std::list<T>::const_iterator> tmp;
for (auto it(list.begin()), end(list.end()); it != end; ++it) {
    tmp.push_back(it);
}
some_sort_of(tmp);
std::list<T> result;
for (auto it(tmp.begin()), end(tmp.end()); it != end; ++it) {
    result.splice(result.end(), list, it);
}
Nabala answered 9/2, 2012 at 23:59 Comment(1)
Thanks! Finally someone using a sane structure for sorting (eg, a vector instead of a set...). At least here we have an efficient memory allocation pattern (and could even use reserve) even though the memory access is still likely to be chaotic.Bice
O
2

Let's say you wanted to do a partial_sort on a list. You could store the iterators to the list in a set, by providing a comparison function that can sort using the iterators, like this:

struct iterator_less
{   
    bool operator() (std::list<int>::iterator lhs,
                 std::list<int>::iterator rhs) const
    {
        return (*lhs < *rhs);
    }
};
typedef std::multiset<
    std::list<int>::iterator, iterator_less
> iterator_set;

The you could let set perform the sort, but since it contains iterators to list, you could you list::splice to splice them into a partial_sorted list:

std::list<int> unsorted, partialSorted;
unsorted.push_back(11);
unsorted.push_back(2);
unsorted.push_back(2);
unsorted.push_back(99);
unsorted.push_back(2);
unsorted.push_back(4);
unsorted.push_back(5);
unsorted.push_back(7);
unsorted.push_back(34);

    // First copy the iterators into the set
iterator_set itSet;
for( auto it = unsorted.begin(); it!=unsorted.end();++it)
{
    itSet.insert(it);
}
    // now if you want a partial_sort with the first 3 elements, iterate through the
    // set grabbing the first item in the set and then removing it.
int count = 3;
while(count--)
{
    iterator_set::iterator setTop = itSet.begin();
    partialSorted.splice(
        partialSorted.begin(),
        unsorted,
        *setTop);
    itSet.erase(setTop);
}

partialSorted.splice(
        partialSorted.end(),
        unsorted,
        unsorted.begin(),
        unsorted.end());
Outguess answered 10/2, 2012 at 0:53 Comment(6)
+1, this is the correct implementation of what Scott Meyers had in mind. However, you might want to choose cleaner names, everything stats with li. My eyes hurt after reading this.Perceive
Here's a cleaned-up version which avoids using a second list (items are spliced directly from one position in the list to another). Feel free to use (or not use).Perceive
@Outguess thanks, very good answer. It is simply to understand but Dietmar's implementation is more effective, so I've chosen his answer.Partan
@AndréCaron Thanks, your solution is indeed cleaner (and a nice general purpose implementation).Outguess
@DaddyM Yup Dietmer's solution is cleaner, but it doesn't use a "sorted container".Outguess
@Outguess Ok. But he successfully simulates "ordered container" with vector - some_kind_of_sort() pair. There is no doubt that your solution using "natively" ordered container, but sorted vector is also an ordered container.Partan
O
1

An ordered container would be either std::set or std::map. If you're willing to make a comparator that takes iterators you would use std::set<std::list<mydata>::iterator,comparator>, otherwise you could use std::map<mydata,std::list<mydata>::iterator>. You go through your list from begin() to end() and insert the iterators into the set or map; now you can use it to access the items in the list in sorted order by iterating the set or map, because it's automatically sorted.

Ordain answered 9/2, 2012 at 23:57 Comment(1)
Note that using std::set<> will remove duplicates in the original list. You're better off with std::multiset<>.Perceive
R
1

Ordered containers are std::set and std::multiset. std::set implements a BST. So what it says is that you should crate an std::set<std::list::iterators> and then use the inherent BST structure to do the sorting. Here is a link on BST to get you started.

Radtke answered 10/2, 2012 at 2:22 Comment(0)
W
0

Edit Ah. Just noticed "ordered container of iterators". That would imply creating an index into another container.

Boost Multi Index has many example of such things (where a single collections is indexed by several different ordering predicates and the indices are nothing more than collections of 'pointers' (usually iterators) into the base container.


"A third is to use the information in an ordered container of iterators to iteratively splice the list's elements into the positions you'd like them to be in"

One thing I think would match that description is when doing std::sort_heap of a list/vector which has had std::make_heap/push_heap/pop_heap operating on it.

  • make_heap : convert a sequence to a heap
  • sort_heap : sort a heap
  • push_heap : insert an element in a heap
  • pop_heap : remove the top element from a heap

Heaps are organizations of elements within sequences, which make it (relatively) efficient to keep the collection in a known ordering under insert/removal. The order is implicit (like a recursive defined binary tree stored in a contiguous array) and can be transformed into the corresponding properly sorted sequence by doing the (highly efficient) sort_heap algorithm on it.

Walkthrough answered 9/2, 2012 at 23:50 Comment(1)
Boost Multi Index [...] indices are nothing more than collections of 'pointers' This is not correct, the internal implementation is more efficient than that. See for instance this explanation on the internal structure of multi_index_containers.Vexed

© 2022 - 2024 — McMap. All rights reserved.