What is the best way to concatenate two vectors?
Asked Answered
C

11

255

I'm using multitreading and want to merge the results. For example:

std::vector<int> A;
std::vector<int> B;
std::vector<int> AB;

I want AB to have to contents of A and the contents of B in that order. What's the most efficient way of doing something like this?

Curtain answered 5/7, 2010 at 4:36 Comment(2)
If looking for efficiency when you work with large size containers, it might be more efficient to use list, where you can splice one to the other with several pointer operations. But list has space overhead (consider using single linked list).Submit
Does this answer your question? Concatenating two std::vectorsScarbrough
I
408
AB.reserve( A.size() + B.size() ); // preallocate memory
AB.insert( AB.end(), A.begin(), A.end() );
AB.insert( AB.end(), B.begin(), B.end() );
Inlet answered 5/7, 2010 at 4:39 Comment(13)
what is the time complexity for such an operation? Is it constant time or O(n)?Granicus
it should copy each element, so it's O(n)Inlet
But amortized constant complexity when properly implemented. http://en.cppreference.com/w/cpp/container/vector/push_backEury
Not sure whether to ask a new question or not, but can this answer be improved when taking move semantics into account? Is there some way I can expect/instruct the compiler to do a single memory move instead of looping over all elements?Beaudoin
@Eury No. It is amortized constant time to push_back one element. To push back n elements is O(n)Cheese
@Konrad I didn't imply otherwise, but thanks for the clarification. Note the complexity of an insert operation is never given in terms of the number of elements being inserted - which will always give O(n) - but in terms of the number of elements already in the container, since that provides a measure of its scalability.Eury
@KirillV.Lyadvinsky Why isn't doing AB.insert() after declaring AB good enough. Why do we need to preallocate memory?Rugged
@Prakhar Agrawal AB.insert() will insert new elements one by one and this can lead to ineffective memory allocation (sequence of memory reallocations).Inlet
@KirillV.Lyadvinsky: That would imply that the vector implementation doesn't make use of the fact that the iterators are random access (and thus forward) iterators. Any decent vector implementation will reallocate only once (for every insert operation on forward iterators), unless the given iterator is an input iterator.Chinn
@Pixelchemist: Right but there are two insert operations here.Legaspi
@LightnessRacesinOrbit: Yes - as I said via "once (for every insert operation [...])". I don't get your point. I just think that the comment about -insert() inserting elements one by one leading to a sequence of memory reallocations- is somewhat misleading (at least in the common as well as the OP case). This is why i commented. I didn't imply that using reserve is the wrong thing to do. It is absolutely justified in order to avoid one reallocation.Chinn
@Pixelchemist: Right, and I didn't say that you implied that. I'm just observing that, in this particular example, the reserve does have merit. So you're both right to some degree.Legaspi
@BroesDeCat You could use std::move_iterator (see here) which is an iterator adapter that returns an rvalue reference when dereferencing it. That enables (but not guarantees) that the source elements are moved.Sashasashay
S
95

This is precisely what the member function std::vector::insert is for

std::vector<int> AB = A;
AB.insert(AB.end(), B.begin(), B.end());
Sanctity answered 5/7, 2010 at 4:40 Comment(8)
Except std::vector::insert is godawful slow when all you're trying to do is concatenate two vectors. This is where <algorithm> is fail and you really wish std::vector::extend was a real method.Riancho
@Nick: Slow compared to what?Civet
Maybe that it checks for enough space on each insert of element? Using reserve beforehand will speed it up.Gage
@GMan: It's slow compared to knowing how to preallocate the final size.Riancho
@Nick: I wouldn't be surprised if every modern stdlib implementation specialized insert on random-access iterators and reserved up-front.Civet
@Gman: That's a fair point since we know that the source is also a vector (where iterator distance has O(1) complexity). Still, the performance guarantees of insert are something to be mindful of when you can often do better by planning ahead.Riancho
@Gage checking for space is only a few instructions: load capacity, compare to size, conditional jump; all of which is negligible cost for most cases. Since size < capacity most of the time, branch prediction will likely cause the non-reallocating branch's instructions to be in the instruction pipeline, minimising branch-induced latency except for low iteration count. This assumes a good vector implementation, plus CPU instruction pipeline & [good] branch prediction, but those are pretty reliable assumptions for a modern toolchain and desktop machine. Don't know about smartphones though..Eury
If done repeatedly then NOT using reserve(...) will be faster.Noddle
K
28

Depends on whether you really need to physically concatenate the two vectors or you want to give the appearance of concatenation of the sake of iteration. The boost::join function

http://www.boost.org/doc/libs/1_43_0/libs/range/doc/html/range/reference/utilities/join.html

will give you this.

std::vector<int> v0;
v0.push_back(1);
v0.push_back(2);
v0.push_back(3);

std::vector<int> v1;
v1.push_back(4);
v1.push_back(5);
v1.push_back(6);
...

BOOST_FOREACH(const int & i, boost::join(v0, v1)){
    cout << i << endl;
}

should give you

1
2
3
4
5
6

Note boost::join does not copy the two vectors into a new container but generates a pair of iterators (range) that cover the span of both containers. There will be some performance overhead but maybe less that copying all the data to a new container first.

Kep answered 5/7, 2010 at 8:55 Comment(1)
Nice idea. After thinking for a while I realized this goal may also be accomplished without using boost libraries. I've posted an answer explaining how.Tumefy
T
23

EDIT: C++23 provides the append_range method for vectors:

AB.append_range(A);
AB.append_range(B);

and that's it!

Time complexity is O(|A| + |B|). For those cases where "to look like concatenated" suffices, an O(1) (constant time) alternative is described below.


In the direction of Bradgonesurfing's answer, many times one doesn't really need to concatenate two vectors (O(n)), but instead just work with them as if they were concatenated (O(1)). If this is your case, it can be done without Boost libraries.

The trick is to create a vector proxy: a wrapper class which manipulates references to both vectors, externally seen as a single, contiguous one.

USAGE

std::vector<int> A{ 1, 2, 3, 4, 5};
std::vector<int> B{ 10, 20, 30 };

vproxy<int> AB(A, B);  // ----> O(1). No copies performed.

for (size_t i = 0; i < AB.size(); ++i)
    std::cout << AB[i] << " ";  // 1 2 3 4 5 10 20 30

IMPLEMENTATION

template <class T>
class vproxy {
private:
    std::vector<T>& v1, v2;
public:
    vproxy(std::vector<T>& ref1, std::vector<T>& ref2) : v1(ref1), v2(ref2) {}
    const T& operator[](const size_t& i) const;
    const size_t size() const;
};

template <class T>
const T& vproxy<T>::operator[](const size_t& i) const{
    return (i < v1.size()) ? v1[i] : v2[i - v1.size()];
};

template <class T>
const size_t vproxy<T>::size() const { return v1.size() + v2.size(); };

MAIN BENEFIT

It's O(1) (constant time) to create it, and with minimal extra memory allocation.

SOME STUFF TO CONSIDER

  • You should only go for it if you really know what you're doing when dealing with references. This solution is intended for the specific purpose of the question made, for which it works pretty well. To employ it in any other context may lead to unexpected behavior if you are not sure on how references work.
  • In this example, AB does not provide a non-const access operator ([ ]). Feel free to include it, but keep in mind: since AB contains references, to assign it values will also affect the original elements within A and/or B. Whether or not this is a desirable feature, it's an application-specific question one should carefully consider.
  • Any changes directly made to either A or B (like assigning values, sorting, etc.) will also "modify" AB. This is not necessarily bad (actually, it can be very handy: AB does never need to be explicitly updated to keep itself synchronized to both A and B), but it's certainly a behavior one must be aware of. Important exception: to resize A and/or B to sth bigger may lead these to be reallocated in memory (for the need of contiguous space), and this would in turn invalidate AB.
  • Because every access to an element is preceded by a test (namely, "i < v1.size()"), vproxy access time, although constant, is also a bit slower than that of vectors.
  • This approach can be generalized to n vectors. I haven't tried, but it shouldn't be a big deal.
Tumefy answered 24/4, 2019 at 21:23 Comment(2)
exactly what i needed, thanks a lotCoffeepot
FWIW, as of March 2024 neither gcc nor clang implemented std::vector::append_range(). CORRECTION: It's libstdc++ that didn't implement it. Clang with libc++ works as of clang 17.Unformed
E
18

Based on Kiril V. Lyadvinsky answer, I made a new version. This snippet use template and overloading. With it, you can write vector3 = vector1 + vector2 and vector4 += vector3. Hope it can help.

template <typename T>
std::vector<T> operator+(const std::vector<T> &A, const std::vector<T> &B)
{
    std::vector<T> AB;
    AB.reserve(A.size() + B.size());                // preallocate memory
    AB.insert(AB.end(), A.begin(), A.end());        // add A;
    AB.insert(AB.end(), B.begin(), B.end());        // add B;
    return AB;
}

template <typename T>
std::vector<T> &operator+=(std::vector<T> &A, const std::vector<T> &B)
{
    A.reserve(A.size() + B.size());                // preallocate memory without erase original data
    A.insert(A.end(), B.begin(), B.end());         // add B;
    return A;                                        // here A could be named AB
}
Eicher answered 27/1, 2014 at 20:34 Comment(10)
Do you mean to add the elements of each vector to each other? Or you mean to append? This is clear now but for the next 5 years..? You shouldn't overload operator if meaning is ambiguous.Littoral
@Littoral I mean to concate. I wrote this answer 3 years ago. I still know what It means. No problem there. If C++ could provide its own overload it will be even better. (and yes :: is taken ;)Eicher
Definitely not clear in general that v1 + v2 doesn't represent addition.Certiorari
@Apollys wellEicher
Alternative would be to use @ like in F#Eicher
When appending a vector (operator+=) it is preferable not to call reserve before insert. See these notes on reserve. It makes sense for the other operator (operator+) though.Toilet
@Toilet "When inserting a range, the range version of insert() is generally preferable as it preserves the correct capacity growth behavior, unlike reserve() followed by a series of push_back()s." So do I need reserve or can I ignore it and let the STL manage the allocation?Eicher
@aloisdgmovingtocodidact.com At first glance, I would say that the hint says that insert() is preferable to reserve(). So that you should do without reserve(). However, the note compares this to the specific example with push_back()s, which you do not use. My first intention is to say that reserve() can be omitted. However, I would not stake my life on it. Would be worth a separate question.Toilet
@Toilet Indeed. Feel free to link the new question here :)Eicher
>> is a good candidate for "append" operator, as that is what bash does for files.Jaquelin
B
3

One more simple variant which was not yet mentioned:

copy(A.begin(),A.end(),std::back_inserter(AB));
copy(B.begin(),B.end(),std::back_inserter(AB));

And using merge algorithm:


#include <algorithm>
#include <vector>
#include <iterator>
#include <iostream>
#include <sstream>
#include <string>

template<template<typename, typename...> class Container, class T>
std::string toString(const Container<T>& v)
{
    std::stringstream ss;
    std::copy(v.begin(), v.end(), std::ostream_iterator<T>(ss, ""));
    return ss.str();
};


int main()
{
    std::vector<int> A(10);
    std::vector<int> B(5);  //zero filled
    std::vector<int> AB(15);

    std::for_each(A.begin(), A.end(),
            [](int& f)->void
            {
                f = rand() % 100;
            });

    std::cout << "before merge: " << toString(A) << "\n";
    std::cout << "before merge: " << toString(B) << "\n";
    merge(B.begin(),B.end(), begin(A), end(A), AB.begin(), [](int&,int&)->bool {});
    std::cout << "after merge:  " << toString(AB) << "\n";

    return 1;
}

Blandish answered 24/2, 2016 at 13:17 Comment(0)
N
0

All the solutions are correct, but I found it easier just write a function to implement this. like this:

template <class T1, class T2>
void ContainerInsert(T1 t1, T2 t2)
{
    t1->insert(t1->end(), t2->begin(), t2->end());
}

That way you can avoid the temporary placement like this:

ContainerInsert(vec, GetSomeVector());
Navarre answered 27/10, 2014 at 12:53 Comment(0)
R
0

For this use case, if you know beforehand the number of results each thread produces, you could preallocate AB and pass a std::span to each thread. This way the concatenation need not be done. Example:

std::vector<int> AB(total_number_of_results, 0);
std::size_t chunk_length = …;
std::size_t chunk2_start = chunk_length;
std::size_t chunk3_start = 2 * chunk_length; // If needed
…
// Pass these to the worker threads.
std::span<int> A(AB.data(), chunk_length);
std::span<int> B(AB.data() + chunk2_start, chunk_length);
…
Rightism answered 8/4, 2022 at 9:47 Comment(0)
P
0

My answer is based on Mr.Ronald Souza's original solution. In addition to his original solution, I've written a vector proxy that supports iterators too!

short description for people who are not aware of the context of the original solution: the joined_vector template class (i.e the vector proxy)takes two references of two vectors as constructor arguments, it then treats them as one contiguous vector. My implementation also supports a forward-iterator.

USAGE:

int main()
{
    std::vector<int> a1;
    std::vector<int> a2;

    joined_vector<std::vector<int>> jv(a1,a2);

    for (int i = 0; i < 5; i++)
        a1.push_back(i);
    for (int i = 5; i <=10; i++)
        a2.push_back(i);

    for (auto e : jv)
        std::cout << e<<"\n";
    for (int i = 0; i < jv.size(); i++)
        std::cout << jv[i] << "\n";
    return 0;
}

IMPLEMENTATION:

template<typename _vec>
class joined_vector
{
    _vec& m_vec1;
    _vec& m_vec2;

public:

    struct Iterator
    {
        typedef typename _vec::iterator::value_type type_value;
        typedef typename _vec::iterator::value_type* pointer;
        typedef typename _vec::iterator::value_type& reference;
        typedef std::forward_iterator_tag iterator_category;
        typedef std::ptrdiff_t difference_type;
        _vec* m_vec1;
        _vec* m_vec2;
        Iterator(pointer ptr) :m_ptr(ptr)
        {

        }
        Iterator operator++()
        {
            if (m_vec1->size() > 0 && m_ptr == &(*m_vec1)[m_vec1->size() - 1] && m_vec2->size() != 0)
                m_ptr = &(*m_vec2)[0];
            else
                ++m_ptr;
            return m_ptr;
        }
        Iterator operator++(int)
        {
            pointer curr = m_ptr;
            if (m_vec1->size() > 0 && m_ptr == &(*m_vec1)[m_vec1->size() - 1] && m_vec2->size() != 0)
                m_ptr = &(*m_vec2)[0];
            else
                ++m_ptr;
            return curr;
        }
        reference operator *()
        {
            return *m_ptr;
        }
        pointer operator ->()
        {
            return m_ptr;
        }

        friend bool operator == (Iterator& itr1, Iterator& itr2)
        {
            return itr1.m_ptr == itr2.m_ptr;
        }
        friend bool operator != (Iterator& itr1, Iterator& itr2)
        {
            return itr1.m_ptr != itr2.m_ptr;
        }
    private:
        pointer m_ptr;
    };

    joined_vector(_vec& vec1, _vec& vec2) :m_vec1(vec1), m_vec2(vec2)
    {

    }
    Iterator begin()
    {
        //checkes if m_vec1 is empty and gets the first elemet's address,
        //if it's empty then it get's the first address of the second vector m_vec2
        //if both of them are empty then nullptr is returned as the first pointer
        Iterator itr_beg((m_vec1.size() != 0) ? &m_vec1[0] : ((m_vec2.size() != 0) ? &m_vec2[0] : nullptr));
        itr_beg.m_vec1 = &m_vec1;
        itr_beg.m_vec2 = &m_vec2;
        return itr_beg;
    }
    Iterator end()
    {
        //check if m_vec2 is empty and get the last address of that vector
        //if the second vector is empty then the m_vec1's vector/the first vector's last element's address is taken
        //if both of them are empty then a null pointer is returned as the end pointer
        typename _vec::value_type* p = ((m_vec2.size() != 0) ? &m_vec2[m_vec2.size() - 1] : ((m_vec1.size()) != 0 ? &m_vec1[m_vec1.size() - 1] : nullptr));
        Iterator itr_beg(p != nullptr ? p + 1 : nullptr);
        itr_beg.m_vec1 = &m_vec1;
        itr_beg.m_vec2 = &m_vec2;
        return itr_beg;
    }
    typename _vec::value_type& operator [](int i)
    {
        if (i < m_vec1.size())
            return m_vec1[i];
        else
            return m_vec2[i - m_vec1.size()];
    }
    size_t size()
    {
        return m_vec1.size() + m_vec2.size();
    }

};
Photomural answered 13/4, 2022 at 17:54 Comment(0)
A
0

I know this is a very old question, but there is one solution missing that I know someone will need to hear:

The "best" way to concatenate is to... not concatenate.

That is to say, if your specific case allows it, then the best/fastest/most efficient way to read two vectors as one is to read them as two. Concatenation might make your code more readable, but it can add in a significant performance hit.

I had a case where I was accumulating 4 vectors into a single vector as a core part of the main simulation loop in my Agent Based Model. When I did some profiling, the insert function suggested here was taking up 20% of the run time. After trying every which way to speed up the concatenation process, I scrapped it altogether and just wrote 4 separate loops for each of the original vectors, and gained almost the full 20% time improvement.

I guess the moral of the story is: if you already have a bunch of things in a vector format, and it's not a huge deal to keep handling them as separate vectors, keep them that way. Concatenating just introduces memory management processes to your code.

Ambry answered 21/12, 2023 at 0:46 Comment(0)
D
-1

If your vectors are sorted*, check out set_union from <algorithm>.

set_union(A.begin(), A.end(), B.begin(), B.end(), AB.begin());

There's a more thorough example in the link.

Duke answered 5/7, 2010 at 4:39 Comment(1)
also, it does not do the same thing as a straight append - the elements in the output range are unique, which may not be what the OP wanted (they might not even be comparable). It's certainly not the most efficient way of doing it.Escutcheon

© 2022 - 2024 — McMap. All rights reserved.