Best way to extract a subvector from a vector?
Asked Answered
O

16

402

Suppose I have a std::vector (let's call it myVec) of size N. What's the simplest way to construct a new vector consisting of a copy of elements X through Y, where 0 <= X <= Y <= N-1? For example, myVec [100000] through myVec [100999] in a vector of size 150000.

If this cannot be done efficiently with a vector, is there another STL datatype that I should use instead?

Onshore answered 7/1, 2009 at 18:56 Comment(4)
you say you want to extract a subvector, but it seems to me that what you really want is a view / access to the subvector - difference being that a view would not copy - old school C++ would be to use start pointer and end pointer, given the fact that mem on a std::vector is contiguous, then it should be possible for you to iterate using pointers and thereby avoid copy, however if you do not mind copy, then just initialize a new vector with the scope of your previous vectorHeterotypic
There is .data()(cplusplus.com/reference/vector/vector/data) since c++11. However, using pointers is discouraged within stl containers , see #31664270Supraorbital
@Heterotypic maybe not interested to OP but I would need to know how to " initialize a new vector with the scope of your previous vector".Foregone
Note) There's also assign in std::vector, such as v.assign(s.begin()+M, s.begin()+N);Haemoid
S
482
vector<T>::const_iterator first = myVec.begin() + 100000;
vector<T>::const_iterator last = myVec.begin() + 101000;
vector<T> newVec(first, last);

It's an O(N) operation to construct the new vector, but there isn't really a better way.

Smoothbore answered 7/1, 2009 at 19:4 Comment(17)
+1, also it's O(Y-X), which is less than or equal to O(N) (and in his example much less)Decare
@Decare Well, then it's O(N) after all.Leatri
@GregRogers: It doesn't make sense to use the big-O notation where N is a specific number. Big-O communicates the rate of growth with respect to how N changes. Johann: It's best not to use one variable name in two ways. We'd normally say either O(Y-X), or we'd say O(Z) where Z=Y-X.Spathic
@GregRogers By using this way, we need to declare a new vector. Is there a way to change the original vector? something like myVec(first, last)? I know this is wrong, but I really need the solution as I want to use recursion in my codes, and need to repeatedly use the same vector (although changed). Thanks!Proverb
@Proverb Create a new vector and swap?Hokanson
Why not just vector<T> newVec(myVec.begin() + 100000, myVec.begin() + 101000);?Crosson
In addition, If newVec already exists and you want to replace it's contents use newVec.asign(first, last). It has the same semantics as the constructor in the answer.Acetone
what happens if iterator last is outofbounds?Tion
You can also use auto instead of vector<T>::const_iteratorMorton
Just for my own reference, the resulting vector is a shallow copy right? the sub-vector is not pointers to the original vectors underlying data structure right? I basically want to know that if I return back up a stackframe, that the internal pointers to the resulting sub-vector are still valid because they're separate from the original super-vector.Multilateral
Is line 3 a copy? Is it possible to construct a reference subvector that way?Salinger
As a side note std::vector range constructor constructs the container with the contents of the range [first, last).Tactic
Great proposition. Could we use auto cbegin instead of begin ? auto first = myVec.cbegin() + 100000; auto last = myVec.cbegin() + 101000;Nalepka
@Tactic that's the constructor this answer uses.Hali
@Nalepka Yes.Hali
@Hali yes it is the constructor used, I thought pointing out that last element is not included was relevant , hence the [x,y) convention, in fact the answer used vector.begin + ..1000 to get ...999 elementsTactic
@DrDeo, i tried out of bounds last in a toy program: godbolt.org/z/87MsPqnWT It doesn't do a bounds check. Compiler :clang 14.0.0, c++2aFlasher
D
111

Just use the vector constructor.

std::vector<int>   data();
// Load Z elements into data so that Z > Y > X

std::vector<int>   sub(&data[100000],&data[101000]);
Diaphoretic answered 7/1, 2009 at 19:14 Comment(12)
Ok, I didn't realize it was that simple to obtain an iterator from an arbitrary vector element.Brevity
Well, you actually get pointers to the elements, which in most cases are equivalent to iterators. However, I assume that they won't be checked like iterators when you have a checked-iterators implementation. Could be some magic I'm unaware of though that makes that work too...Oho
Taking the address of those vector elements is an unportable hack that will break if the vector storage is not in fact contiguous. Use begin() + 100000 etc.Strickland
My bad, apparently the standard guarantees that vector storage is contiguous. Nevertheless it's bad practice to work with addresses like this as it is certainly not guaranteed to work for all containers supporting random access, while begin() + 100000 is.Strickland
@j_random_hacker: Sorry have to disagree. The STL specification for std::vector was explicitly changed to support this type of procedure. Also a pointer is valid type of iterator. Look up iterator_traits<>Diaphoretic
Yep, pointers aren't necessarily valid std::vector<T>::iterator's but they are valid iterators for an array in continuous storage, which is how it is used.Smoothbore
@Uri: This will always work. But like all code it is limited by system constraints. If you try and do something that is not physically possible you will get an exception.Diaphoretic
How would you slice up to the last element using that ? If you do sub(&data[100000],&data[data.size()]); That would not always work and may throw an access violation ?Aara
@taktak004 Nope. Remember that operator[] returns a reference. It is only at the point where you read or write the reference that it would become an access violation. Since we do neither but instead get the address we have not invoked UB,.Diaphoretic
This won't work if you need to copy data till end of source vector.Pol
@bialix: It will. I assume you are talking about taking the address of one past then end being the problem. This is covered by the standard and is allowed. It is UB to dereference this addresses but not to take its address. Note the definition of X[Y] is defined as *(X + Y) in the standard. And the operators &* placed together cancel each other out (assuming they have not been overridden (which they can't be for pointers)). => &X[Y] => &*(X + Y) => (X + Y). Which is also why 10000[data] is a valid expression for array access.Diaphoretic
actually MSVC 2017 does raise an access violation for the case that @taktak004 considered (although only in Debug mode and with Debuglevel 2)Heat
O
64

This discussion is pretty old, but the simplest one isn't mentioned yet, with list-initialization:

 vector<int> subvector = {big_vector.begin() + 3, big_vector.end() - 2}; 

It requires c++11 or above.

Example usage:

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

int main(){

    vector<int> big_vector = {5,12,4,6,7,8,9,9,31,1,1,5,76,78,8};
    vector<int> subvector = {big_vector.begin() + 3, big_vector.end() - 2};

    cout << "Big vector: ";
    for_each(big_vector.begin(), big_vector.end(),[](int number){cout << number << ";";});
    cout << endl << "Subvector: ";
    for_each(subvector.begin(), subvector.end(),[](int number){cout << number << ";";});
    cout << endl;
}

Result:

Big vector: 5;12;4;6;7;8;9;9;31;1;1;5;76;78;8;
Subvector: 6;7;8;9;9;31;1;1;5;76;
Omdurman answered 6/11, 2019 at 6:47 Comment(6)
it defines the subrange: Starts from the 3rd element, and goes until the 2nd from behindSupraorbital
Can you please clarify whether this way includes the last element?Reflective
It's inclusive, as the example shows: the last element is the one 2 positions before .end()Supraorbital
@DávidTóth It's exclusive. .end() returns the iterator to the "past the last" element.Haemoid
inclusive for the last element, but exclusive of the "past the element" element..Supraorbital
Not important but to avoid confusion: In vector<int> subvector = {iter1, iter2}, iter2 is exclusive. This does not change depending on your perspective. You said "the last element is the one 2 positions before .end()", but it's not. The last element(76) is the one 3 positions before .end(). Very tiny stuff but let's be careful :)Haemoid
B
34

These days, we use spans! So you would write:

#include <span>

...
auto start_pos = 100000;
auto length = 1000;
auto span_of_myvec = std::make_span(myvec);
auto my_subspan = span_of_myvec.subspan(start_pos, length);

to get a span of 1000 elements of the same type as myvec's. Or a more terse form:

auto my_subspan = std::make_span(myvec).subspan(1000000, 1000);

(but I don't like this as much, since the meaning of each numeric argument is not entirely clear; and it gets worse if the length and start_pos are of the same order of magnitude.)

Anyway, remember that this is not a copy, it's just a view of the data in the vector, so be careful. If you want an actual copy, you could do:

std::vector<T> new_vec(my_subspan.cbegin(), my_subspan.cend());

Notes:

Barboza answered 9/8, 2017 at 13:16 Comment(2)
would use cbegin and cend just for the principle ;) std::cbegin etc even.Dis
@JHBonarius: Seeing how this code isn't templated on the choice of container, I don't see there's a particular benefit; a matter of taste I suppose.Barboza
I
32

std::vector<T>(input_iterator, input_iterator), in your case foo = std::vector<T>(myVec.begin () + 100000, myVec.begin () + 150000);, see for example here

Infinity answered 7/1, 2009 at 19:4 Comment(2)
Since Andrew is trying to construct a new vector, I would recommend "std::vector foo(..." instead of copying with "foo = std::vector(..."Swirly
Yeah, of course, but whether you type std::vector<int> foo = std::vector(...) or std::vector<int> foo (...) should not matter.Infinity
C
12

If both are not going to be modified (no adding/deleting items - modifying existing ones is fine as long as you pay heed to threading issues), you can simply pass around data.begin() + 100000 and data.begin() + 101000, and pretend that they are the begin() and end() of a smaller vector.

Or, since vector storage is guaranteed to be contiguous, you can simply pass around a 1000 item array:

T *arrayOfT = &data[0] + 100000;
size_t arrayOfTLength = 1000;

Both these techniques take constant time, but require that the length of data doesn't increase, triggering a reallocation.

Camarena answered 7/1, 2009 at 19:26 Comment(1)
This is also good if you want the original vector and the subvector to be linked.Loup
C
6

You didn't mention what type std::vector<...> myVec is, but if it's a simple type or struct/class that doesn't include pointers, and you want the best efficiency, then you can do a direct memory copy (which I think will be faster than the other answers provided). Here is a general example for std::vector<type> myVec where type in this case is int:

typedef int type; //choose your custom type/struct/class
int iFirst = 100000; //first index to copy
int iLast = 101000; //last index + 1
int iLen = iLast - iFirst;
std::vector<type> newVec;
newVec.resize(iLen); //pre-allocate the space needed to write the data directly
memcpy(&newVec[0], &myVec[iFirst], iLen*sizeof(type)); //write directly to destination buffer from source buffer
Churchill answered 16/10, 2015 at 2:3 Comment(3)
I wonder if with -O3, @Anteru's "using constructor" std::vector(myVec.begin () + 100000, myVec.begin () + 150000); , wouldn't the longer-version of this produce into exactly the same assembly?Raeraeann
MSVC++ 2015, for example, compiles std::vector<>(iter, iter) to memmove(), if appropriate (if constructor is trivial, for a suitable definition of trivial).Superabundant
Don't call memcpy. Do a std::copy or a constructor which accepts a range (two iterators), and the compiler and the std.library will conspire to call memcpy when appropriate.Navelwort
A
5

You could just use insert

vector<type> myVec { n_elements };

vector<type> newVec;

newVec.insert(newVec.begin(), myVec.begin() + X, myVec.begin() + Y);
Avera answered 10/6, 2019 at 0:27 Comment(0)
Y
4

You can use STL copy with O(M) performance when M is the size of the subvector.

Yellowish answered 7/1, 2009 at 19:3 Comment(8)
Upvoted because it pointed me in the right direction but I can see why @LokiAstari suggests it's not the correct choice - since the STL::copy works with two std::vector<T> arrays of the same size and type. Here, the OP wants to copy a subsection into a new, smaller array as outlined here in the OP's post: "0 <= X <= Y <= N-1"Istle
@Andrew, see example using std::copy and std::back_inserterZenas
@LokiAstari why not?Zenas
@chrisg: The clue is in the name copy. To use std::copy you need to build the destination first (which means initializing all the members to the their default value). Then you can copy them over. So its two distinct operations. Why not simply use the vector constructor and initialize the members directly to their final values. See the nice answer that has +50 votes.Diaphoretic
@LokiAstari I was referring to an edit to this that did not survive peer review, which posed the example <br/> vector<T> newvec; std::copy(myvec.begin()+10000, myvec.begin() +10100, std::back_inserter(newvec)); <br/> in this case, you don't need to build the destination first, but sure, direct initialization is more... direct.Zenas
@chrisg: Its also two lines. Additionally you need to stick a third line in to make sure it is efficient. newvec.reserve(10100 - 10000);. ITs definitely an option and technically it will work. But out of the two which are you going to recommend?Diaphoretic
@LokiAstari ah, ok, I'm sold. The need for reserve on larger sets pushed me over the line. I also didn't like the "under the hood" feel of using naked pointers, dependency on an underlying detail (or "guarantee" that vector is contiguous), and unchecked array-like accessing, because they feel very C-like and un-C++-like, but then I decided its no worse than the iterator arithmetic used for std::copy approachZenas
@chrisg: The most up-voted answer uses standard iterators and the constructor.Diaphoretic
B
2

Suppose there are two vectors.

 vector<int> vect1{1, 2, 3, 4};
 vector<int> vect2;

Method 1. Using copy function. copy(first_iterator_index, last_iterator_index, back_inserter()) :- This function takes 3 arguments, firstly, the first iterator of old vector. Secondly, the last iterator of old vector and third is back_inserter function to insert values from back.

    // Copying vector by copy function
    copy(vect1.begin(), vect1.end(), back_inserter(vect2));

Method 2. By using Assign Function. assign(first_iterator_o, last_iterator_o). This method assigns the same values to new vector as old one. This takes 2 arguments, first iterator to old vector and last iterator to old vector.

    //Copying vector by assign function
    vect2.assign(vect1.begin(), vect1.end());
Brag answered 15/11, 2021 at 2:13 Comment(0)
A
1

The only way to project a collection that is not linear time is to do so lazily, where the resulting "vector" is actually a subtype which delegates to the original collection. For example, Scala's List#subseq method create a sub-sequence in constant time. However, this only works if the collection is immutable and if the underlying language sports garbage collection.

Aphyllous answered 7/1, 2009 at 19:6 Comment(1)
in c++ way to do that would be to have vector of shared_ptr to X instead of vector of X and then copy SPs, but unfortunately I dont think that is faster because atomic operation involved with cpying SP. Or the original vector could be a const shared_ptr of vector instead and you just take reference to subrange in it. ofc you dont need to make it a shared_ptr of vector but then you have lifetime problems... all this is off top of my head, could be wrong...Elohim
D
0

Maybe the array_view/span in the GSL library is a good option.

Here is also a single file implementation: array_view.

Delmerdelmor answered 27/4, 2017 at 7:36 Comment(1)
Kindly add answer here along with link. As external link might change in futureBelindabelisarius
P
0

Copy elements from one vector to another easily
In this example, I am using a vector of pairs to make it easy to understand
`

vector<pair<int, int> > v(n);

//we want half of elements in vector a and another half in vector b
vector<pair<lli, lli> > a(v.begin(),v.begin()+n/2);
vector<pair<lli, lli> > b(v.begin()+n/2, v.end());


//if v = [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
//then a = [(1, 2), (2, 3)]
//and b = [(3, 4), (4, 5), (5, 6)]

//if v = [(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7)]
//then a = [(1, 2), (2, 3), (3, 4)]
//and b = [(4, 5), (5, 6), (6, 7)]

'
As you can see you can easily copy elements from one vector to another, if you want to copy elements from index 10 to 16 for example then we would use

vector<pair<int, int> > a(v.begin()+10, v.begin+16);

and if you want elements from index 10 to some index from end, then in that case

vector<pair<int, int> > a(v.begin()+10, v.end()-5);

hope this helps, just remember in the last case v.end()-5 > v.begin()+10

Paint answered 24/6, 2018 at 5:25 Comment(0)
D
0

Yet another option: Useful for instance when moving between a thrust::device_vector and a thrust::host_vector, where you cannot use the constructor.

std::vector<T> newVector;
newVector.reserve(1000);
std::copy_n(&vec[100000], 1000, std::back_inserter(newVector));

Should also be complexity O(N)

You could combine this with top anwer code

vector<T>::const_iterator first = myVec.begin() + 100000;
vector<T>::const_iterator last = myVec.begin() + 101000;
std::copy(first, last, std::back_inserter(newVector));
Dis answered 29/6, 2018 at 20:22 Comment(0)
S
0

vector::assign may be another solution

// note: size1 < src.size() && size2 < src.size()
std::vector<int> sub1(size1), sub2(size2);
sub1.assign(src.begin(), src.begin() + size1);
sub2.assign(src.begin(), src.begin() + size2);
Stench answered 15/9, 2022 at 10:6 Comment(0)
A
-2

Posting this late just for others..I bet the first coder is done by now. For simple datatypes no copy is needed, just revert to good old C code methods.

std::vector <int>   myVec;
int *p;
// Add some data here and set start, then
p=myVec.data()+start;

Then pass the pointer p and a len to anything needing a subvector.

notelen must be!! len < myVec.size()-start

Anarchist answered 18/11, 2013 at 20:6 Comment(1)
This doesn't perform a copy.Capeskin

© 2022 - 2024 — McMap. All rights reserved.