Is list better than vector when we need to store "the last n items"?

Asked 18/5, 2017 at 5:1 Answered 19/5, 2017 at 9:7

Solved c++c++11 fifo buffering data-stream

There are a lot of questions which suggest that one should always use a vector, but it seems to me that a list would be better for the scenario, where we need to store "the last n items"

For example, say we need to store the last 5 items seen: Iteration 0:

3,24,51,62,37,

Then at each iteration, the item at index 0 is removed, and the new item is added at the end:

Iteration 1:

24,51,62,37,8

Iteration 2:

51,62,37,8,12

It seems that for this use case, for a vector the complexity will be O(n), since we would have to copy n items, but in a list, it should be O(1), since we are always just chopping off the head, and adding to the tail each iteration.

Is my understanding correct? Is this the actual behaviour of an std::list ?

Christachristabel answered 18/5, 2017 at 5:1 Comment(2)

Check this out... it explains when to choose which one: https://mcmap.net/q/99207/-vector-vs-list-in-stl – Stratosphere 18/5, 2017 at 5:18

This talk by Bjarne Stroustrup is useful: youtube.com/watch?v=YQs6IC-vgmo – Baylor 18/5, 2017 at 5:29

Neither. Your collection has a fixed size and std::array is sufficient.

The data structure you implement is called a ring buffer. To implement it you create an array and keep track of the offset of the current first element.

When you add an element that would push an item out of the buffer - i.e. when you remove the first element - you increment the offset.

To fetch elements in the buffer you add the index and the offset and take the modulo of this and the length of the buffer.

Harrelson answered 18/5, 2017 at 8:0 Comment(10)

+1 A circular buffer is a completely ideal implementation for a queue that has fixed maximum size (unfortunately there is no standard library container for this). – Amr 18/5, 2017 at 8:52

Although I can implement one myself, I think this answer would be more useful to others if there was a simple implementation included. – Christachristabel 18/5, 2017 at 9:13

@KaizerSozay you can always suggest an edit with your implementation if you want – Jason 18/5, 2017 at 15:23

@NieDzejkob yeah I haven't done it yet, as I am testing using a deque, so it will be a while before I get back to this, but I Imagine this is the correct answer. Although you have to implement it yourself, I imagine performance wise you can't go wrong here. – Christachristabel 18/5, 2017 at 15:31

@KaizerSozay: Use boost::circular_buffer. – Airs 18/5, 2017 at 17:0

Btw, in typical implementations, does the index ever get reset to 0? And if not, doesn't that risk overflow in long running programs? – Protohistory 18/5, 2017 at 18:29

The index is reset in the last line, but it could maybe use more emphasis as it's important: "add the index and the offset and take the modulo of this and the length of the buffer". – Outgeneral 18/5, 2017 at 18:51

@DmitryNarkevich As user3067860 already mentioned, the index will be reset. That is in fact the whole point of a circular buffer. The memory is static, so you move your view of the memory to make it look like it was an actual circle. – Aminaamine 19/5, 2017 at 6:45

@DmitryNarkevich The index is passed into the class when you do a lookup - so there is no context in which it makes sense to reset it. The offset should be reset - but because you do a modulo whenever you make a lookup failure to reset shoul not cause a buffer overflow - if you don't reset you could hit undefined behavior when the datatype overflows. – Harrelson 19/5, 2017 at 15:56

@Outgeneral Oh I see, I understood that part as the modulo only being taken while doing lookups and not being stored in the index again. Thanks – Protohistory 19/5, 2017 at 16:56

std::deque is a far better option. Or if you had benchmarked std::deque and found its performance to be inadequate for your specific use, you could implement a circular buffer in a fixed size array, storing the index of the start of the buffer. When replacing an element in the buffer, you would overwrite the element at the start index, and then set the start index to its previous value plus one modulo the size of the buffer.

List traversal is very slow, as list elements can be scattered throughout memory, and vector shifting is actually surprisingly fast, as memory moves on a single block of memory are quite fast even if it is a large block.

The talk Taming The Performance Beast from the Meeting C++ 2015 conference might be of interest to you.

Byng answered 18/5, 2017 at 5:8 Comment(23)

Why is list traversal slow ? If each node has a pointer to the next node, shouldn't it be fast regardless of where in memory it is ? – Christachristabel 18/5, 2017 at 5:17

Locality of reference. Arrays (and therefore vectors) have it; linked lists don't. When you access an element in an array, adjacent elements get pulled into the CPU cache along with it, so they can be accessed more quickly. Linked lists are more likely to cause cache misses. – Teratology 18/5, 2017 at 5:19

@Teratology then do deques have this ? – Christachristabel 18/5, 2017 at 5:27

@KaizerSozay : No, std::deque does not store it's contents in a contiguous block like std::vector. Frankly, I can't remember the last time I used anything other than std::vector for simple collections. – Boyse 18/5, 2017 at 5:30

Yes, std::deque uses arrays — though more than one. Quoting cppreference: "As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays". That's still a lot better than having every individual item allocated separately in a different place. – Teratology 18/5, 2017 at 5:30

I am confused then. For keeping track of the last n items (and iterating through them), would I be better off with a vector, list or deque. It seems one has to weigh the costs of vector shifting, vs list traversal vs deque traversal which seems to be implementation specific, which further complicates this because I am developing cross platform... – Christachristabel 18/5, 2017 at 5:33

@KaizerSozay You would do well to make some performance tests with the different structures. Personally I think the vector would win for something like this. – Baylor 18/5, 2017 at 5:40

@Baylor Yes, I should performance test with different structures, but it seems like a pain, because I would have to run each of the tests on different devices too (cross platform). I would have thought that having a standard would help avoid this problem.. – Christachristabel 18/5, 2017 at 5:43

vector spends time shifting elements up as you remove them. list is cache unfriendly, resulting in more accesses form slower memory. deque uses multiple non-contiguous blocks of contiguous memory, so most of the time it has good cache usage and does no shifting, but it will miss more often and it can really hurt when it allocates a new block. Over all, vector should win on a modern processor unless N is large resulting in a grievous amount of shifting. – Scraggy 18/5, 2017 at 5:44

@KaizerSozay Inmy experience its hard to beat a vector. A deque can give the best of both worlds between a vector and a list but they cal also give the worst of both worlds. So it's highly dependent on your use pattern. But like I said, my money in on the pure vector. – Baylor 18/5, 2017 at 5:45

@Scraggy In this case we are inserting elements into the middle of the deque so not only is it going to be shifting but it is going to be shifting over block boundaries. So I suspect, for this use, the deque may prove to be less efficient than a vector. – Baylor 18/5, 2017 at 5:47

Holy carp. I missed the middle insertion. Yeah. deque fails and so does circular buffer. Vector it is. – Scraggy 18/5, 2017 at 5:49

Wait, no. @Baylor There's no middle insert in OP's question. The way I'm reading it it's always adding to the end. – Scraggy 18/5, 2017 at 5:51

@Scraggy well there is an insertion at position 4. I assume that's non literal and the actual position could be much larger. But if it is literal then a vector is still better because the number of data is tiny – Baylor 18/5, 2017 at 5:52

@Scraggy Oh I see, position 4 is the other end!! – Baylor 18/5, 2017 at 5:54

@Scraggy Then its even stevens which would be more efficient depending on the actual size of the structure. As you get larger the deque would seem more attractive – Baylor 18/5, 2017 at 5:55

five elements, vector wins hands down. 5000... Not so sure. A circular buffer should always win. – Scraggy 18/5, 2017 at 5:56

I think there is some confusion with my wording - there is no middle insertion. What I meant is, that I am always looking at only n elements. So I always delete the first element, when i add my n+1th element to make sure I am always looking at only the last n elements. – Christachristabel 18/5, 2017 at 6:18

It's very difficult to make any solid claims, without benchmarking on the systems you will be running on using the volume of data you would actually be using. That said, a circular buffer will almost certainly be fastest, but requires you to implement it yourself. List will certainly be slowest. If you don't want to implement a circular buffer and don't want to benchmark the different options, then I would take a guess and use a vector if your buffer size is less than a few kilobytes, or a deque if larger than that. – Byng 18/5, 2017 at 6:32

@DavidScarlett: No need to implement a circular buffer yourself, Boost has it. – Wastebasket 18/5, 2017 at 7:13

True, but not everyone uses Boost. – Byng 18/5, 2017 at 7:34

@Wastebasket Yeah I prefer just sticking to the STL. – Christachristabel 18/5, 2017 at 9:11

A std::deque is implemented (at least when using the clang or gcc toolchains) as a circular buffer backed by a vector - so using a deque is the best of all worlds in this case - cache friendly and O(1). – Ramses 23/5, 2017 at 21:27

If you can use Boost, try boost::circular_buffer:

It's a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms.

It provides fixed capacity storage: when the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old

// Create a circular buffer with a capacity for 5 integers.
boost::circular_buffer<int> cb(5);

// Insert elements into the buffer.
cb.push_back(3);
cb.push_back(24);
cb.push_back(51);
cb.push_back(62);
cb.push_back(37);

int a = cb[0];  // a == 3
int b = cb[1];  // b == 24
int c = cb[2];  // c == 51

// The buffer is full now, so pushing subsequent
// elements will overwrite the front-most elements.
cb.push_back(8);   // overwrite 3 with 8
cb.push_back(12);  // overwrite 24 with 12

// The buffer now contains 51, 62, 37, 8, 12.

// Elements can be popped from either the front or the back.
cb.pop_back();  // 12 is removed
cb.pop_front(); // 51 is removed

The circular_buffer stores its elements in a contiguous region of memory, which then enables fast constant-time insertion, removal and random access of elements.

PS ... or implement the circular buffer directly as suggested by Taemyr.

Overload Journal #50 - Aug 2002 has a nice introduction (by Pete Goodliffe) to writing robust STL-like circular buffer.

Amr answered 18/5, 2017 at 8:37 Comment(1)

which then enables fast constant-time insertion, removal of elements Only at either end, otherwise it's linear. – Adenoid 19/5, 2017 at 9:19

The problem is that O(n) only talks about the asymptotic behaviour as n tends to infinity. If n is small then the constant factors involved become significant. The result is that for "last 5 integer items" I would be stunned if vector didn't beat list. I would even expect std::vector to beat std::deque.

For "last 500 integer items" I would still expect std::vector to be faster than std::list - but std::deque would now probably win. For "last 5 million slow-to-copy items", std:vector would be slowest of all.

A ring buffer based on std::array or std::vector would probably be faster still though.

As (almost) always with performance issues:

encapsulate with a fixed interface
write the simplest code that can implement that interface
if profiling shows you have a problem, optimize (which will make the code more complicated).

In practise, just using a std::deque, or a pre-built ring-buffer if you have one, will be good enough. (But it's not worth going to the trouble of writing a ring buffer unless profiling says you need to.)

Tuna answered 19/5, 2017 at 9:7 Comment(0)

If you need to store last N-elements then logically you are doing some kind of queue or a circular buffer, std::stack and std::deque are implementations of LIFO and FIFO queues.

You can use boost::circular_buffer or implement simple circular buffer manually:

template<int Capcity>
class cbuffer
{
public:
    cbuffer() : sz(0), p(0){}
    void push_back(int n)
    {
        buf[p++] = n;
        if (sz < Capcity)
            sz++;
        if (p >= Capcity)
            p = 0;
    }
    int size() const
    {
        return sz;
    }
    int operator[](int n) const
    {
        assert(n < sz);
        n = p - sz + n;
        if (n < 0)
            n += Capcity;
        return buf[n];
    }
    int buf[Capcity];
    int sz, p;
};

Sample use for circular buffer of 5 int elements:

int main()
{
    cbuffer<5> buf;

    // insert random 100 numbers
    for (int i = 0; i < 100; ++i)
        buf.push_back(rand());

    // output to cout contents of the circular buffer
    for (int i = 0; i < buf.size(); ++i)
        cout << buf[i] << ' ';
}

As a note, keep in mind that when you have only 5 elements the best solution is the one that's fast to implement and works correctly.

Bagel answered 18/5, 2017 at 5:59 Comment(0)

Here is a minimal circular buffer. I'm primarily posting that here to get a metric ton of comments and ideas of improvement.

Minimal Implementation

#include <iterator>

template<typename Container>
class CircularBuffer
{
public:
    using iterator   = typename Container::iterator;
    using value_type = typename Container::value_type;
private:
    Container _container;
    iterator  _pos;
public:
    CircularBuffer() : _pos(std::begin(_container)) {}
public:
    value_type& operator*() const { return *_pos; }
    CircularBuffer& operator++() { ++_pos ; if (_pos == std::end(_container)) _pos = std::begin(_container); return *this; }
    CircularBuffer& operator--() { if (_pos == std::begin(_container)) _pos = std::end(_container); --_pos; return *this; }
};

Usage

#include <iostream>
#include <array>

int main()
{
    CircularBuffer<std::array<int,5>> buf;

    *buf = 1; ++buf;
    *buf = 2; ++buf;
    *buf = 3; ++buf;
    *buf = 4; ++buf;
    *buf = 5; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; ++buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;
    std::cout << *buf << " "; --buf;

    std::cout << std::endl;
}

Compile with

g++ -std=c++17 -O2 -Wall -Wextra -pedantic -Werror

Demo

On Coliru: try it online

Cachet answered 18/5, 2017 at 13:17 Comment(1)

If you're really wanting those comments and ideas, you might want to ask for critique over at Code Review. Be sure to read A guide to Code Review for Stack Overflow users first, as some things are done differently over there! – Annamariaannamarie 18/5, 2017 at 16:37

Yes. Time complexity of the std::vector for removing elements from the end is linear. std::deque might be a good choice for what you are doing as it offers constant time insertion and removal at the beginning as well as at the end of the list and also better performance than std::list

Source:

http://www.sgi.com/tech/stl/Vector.html

http://www.sgi.com/tech/stl/Deque.html

Hydrogenate answered 18/5, 2017 at 5:32 Comment(6)

Since the vector size is fixed at 5, any operation is O(5) = constant time. – Wastebasket 18/5, 2017 at 7:15

Yes. But the OP just used n = 5 as an example, that need not mean he/she will always use 5 elements. – Hydrogenate 18/5, 2017 at 8:22

Doesn't matter, O(6) is also constant time. The point is that the number of items is bounded. – Wastebasket 18/5, 2017 at 9:25

@MSalters: that's not how big-O notation works. Your line of thought reduces to "Since no integer N is infinitely large, therefore O(N) = O(1)." Codelyzer's point is that removing from the front of a vector (and moving the other N-1 elements forward) is O(N) in the length of the vector — which is absolutely 100% true. – Text 19/5, 2017 at 15:27

You might counter: "Ah, but my vector has only bounded length — five!" Sure. But someone else might have a vector of length six; or length 100; or length 10000; and the amount of time it takes to pop_front on those vectors goes up with the length of the vector — as the vector gets longer, the number of operations involved in pop_front goes up linearly. This is what we mean by "O(N)". – Text 19/5, 2017 at 15:30

@Quuxplusone: I know how big-O notation works (and Big-Omega etc). An O(N) algorithm has a running time less than c*N, as N goes to infinity. Mathematically, this is known as a "limit". The fundamental point remains that bounded numbers do not go to infinity. – Wastebasket 19/5, 2017 at 16:4

Here are the beginnings of a ring buffer based dequeue template class that I wrote a while ago, mostly to experiment with using std::allocator (so it does not require T to be default constructible). Note it currently doesn't have iterators, or insert/remove, copy/move constructors, etc.

#ifndef RING_DEQUEUE_H
#define RING_DEQUEUE_H

#include <memory>
#include <type_traits>
#include <limits>

template <typename T, size_t N>
class ring_dequeue {
private:
    static_assert(N <= std::numeric_limits<size_t>::max() / 2 &&
                  N <= std::numeric_limits<size_t>::max() / sizeof(T),
                  "size of ring_dequeue is too large");

    using alloc_traits = std::allocator_traits<std::allocator<T>>;

public:
    using value_type = T;
    using reference = T&;
    using const_reference = const T&;
    using difference_type = ssize_t;
    using size_type = size_t;

    ring_dequeue() = default;

    // Disable copy and move constructors for now - if iterators are
    // implemented later, then those could be delegated to the InputIterator
    // constructor below (using the std::move_iterator adaptor for the move
    // constructor case).
    ring_dequeue(const ring_dequeue&) = delete;
    ring_dequeue(ring_dequeue&&) = delete;
    ring_dequeue& operator=(const ring_dequeue&) = delete;
    ring_dequeue& operator=(ring_dequeue&&) = delete;

    template <typename InputIterator>
    ring_dequeue(InputIterator begin, InputIterator end) {
        while (m_tailIndex < N && begin != end) {
            alloc_traits::construct(m_alloc, reinterpret_cast<T*>(m_buf) + m_tailIndex,
                                    *begin);
            ++m_tailIndex;
            ++begin;
        }
        if (begin != end)
            throw std::logic_error("Input range too long");
    }

    ring_dequeue(std::initializer_list<T> il) :
        ring_dequeue(il.begin(), il.end()) { }

    ~ring_dequeue() noexcept(std::is_nothrow_destructible<T>::value) {
        while (m_headIndex < m_tailIndex) {
            alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
            m_headIndex++;
        }
    }

    size_t size() const {
        return m_tailIndex - m_headIndex;
    }
    size_t max_size() const {
        return N;
    }

    bool empty() const {
        return m_headIndex == m_tailIndex;
    }
    bool full() const {
        return m_headIndex + N == m_tailIndex;
    }

    template <typename... Args>
    void emplace_front(Args&&... args) {
        if (full())
            throw std::logic_error("ring_dequeue full");
        bool wasAtZero = (m_headIndex == 0);
        auto newHeadIndex = wasAtZero ? (N - 1) : (m_headIndex - 1);
        alloc_traits::construct(m_alloc, elemPtr(newHeadIndex),
                                std::forward<Args>(args)...);
        m_headIndex = newHeadIndex;
        if (wasAtZero)
            m_tailIndex += N;
    }
    void push_front(const T& x) {
        emplace_front(x);
    }
    void push_front(T&& x) {
        emplace_front(std::move(x));
    }

    template <typename... Args>
    void emplace_back(Args&&... args) {
        if (full())
            throw std::logic_error("ring_dequeue full");
        alloc_traits::construct(m_alloc, elemPtr(m_tailIndex),
                                std::forward<Args>(args)...);
        ++m_tailIndex;
    }
    void push_back(const T& x) {
        emplace_back(x);
    }
    void push_back(T&& x) {
        emplace_back(std::move(x));
    }

    T& front() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_headIndex);
    }
    const T& front() const {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_headIndex);
    }
    void remove_front() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        alloc_traits::destroy(m_alloc, elemPtr(m_headIndex));
        ++m_headIndex;
        if (m_headIndex == N) {
            m_headIndex = 0;
            m_tailIndex -= N;
        }
    }
    T pop_front() {
        T result = std::move(front());
        remove_front();
        return result;
    }

    T& back() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_tailIndex - 1);
    }
    const T& back() const {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        return *elemPtr(m_tailIndex - 1);
    }
    void remove_back() {
        if (empty())
            throw std::logic_error("ring_dequeue empty");
        alloc_traits::destroy(m_alloc, elemPtr(m_tailIndex - 1));
        --m_tailIndex;
    }
    T pop_back() {
        T result = std::move(back());
        remove_back();
        return result;
    }

private:
    alignas(T) char m_buf[N * sizeof(T)];
    size_t m_headIndex = 0;
    size_t m_tailIndex = 0;
    std::allocator<T> m_alloc;

    const T* elemPtr(size_t index) const {
        if (index >= N)
            index -= N;
        return reinterpret_cast<const T*>(m_buf) + index;
    }
    T* elemPtr(size_t index) {
        if (index >= N)
            index -= N;
        return reinterpret_cast<T*>(m_buf) + index;
    }
};

#endif

Soothe answered 18/5, 2017 at 16:31 Comment(1)

Anybody, feel free to incorporate this into your answers - this was mostly posted as a response to Taemyr's answer and @KaizerSozay's comment on it suggesting to add a sample implementation. – Soothe 18/5, 2017 at 16:51

Briefly say the std::vector is better for a non-change size of memory.In your case,if you move all data forward or append new data in a vector,that must be a waste.As @David said the std::deque is a good option,since you would pop_head and push_back eg. two way list.

from the cplus cplus reference about the list

Compared to other base standard sequence containers (array, vector and deque), lists perform generally better in inserting, extracting and moving elements in any position within the container for which an iterator has already been obtained, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

The main drawback of lists and forward_lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in a list, one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).

about deque

For operations that involve frequent insertion or removals of elements at positions other than the beginning or the end, deques perform worse and have less consistent iterators and references than lists and forward lists.

vetor

Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

Compared to the other dynamic sequence containers (deques, lists and forward_lists), vectors are very efficient accessing its elements (just like arrays) and relatively efficient adding or removing elements from its end. For operations that involve inserting or removing elements at positions other than the end, they perform worse than the others, and have less consistent iterators and references than lists and forward_lists.

Egwan answered 18/5, 2017 at 5:19 Comment(4)

Actually std::list is a double-linked list, so it is equally good from the front and back. – Behrens 18/5, 2017 at 5:25

@ZanLynx Why have a deque then ? – Christachristabel 18/5, 2017 at 5:27

@KaizerSozay: The std::deque holds larger blocks and makes them look like a both a vector and a list. It is like a list of vectors. It's performance is closer to vector than a list. – Behrens 18/5, 2017 at 5:30

@KaizerSozay: deque has random access, list does not. – Wastebasket 18/5, 2017 at 7:15

-2

I think even use std::deque it also have overhead of copy items in certain condition because std::deque is a map of arrays essentially, so std::list is a good idea to eliminate the copy overhead.

To increase the performance of traverse for std::list, you can implement a memory pool so that the std::list will allocate memory from a trunk and it's spatial locality for caching.

Ropedancer answered 18/5, 2017 at 7:22 Comment(1)

Map? It's a list if anything. What would it map, i.e. what would be the key? – Wray 18/5, 2017 at 7:24

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags