What is the performance penalty of using std::vector in C++?
Asked Answered
H

7

9

Typically, I'm interested in knowing if the the standard template library incurs performance/speed overheads in codes for numerical/scientific computing.

For eg. Is declaring an array as

double 2dmatrix [10][10]

going to give me more performance than

std::vector<std::vector<double> > 2dmatrix(10,std::vector<double>(10,0.0))

?

I would also appreciate some general ideas, as to whether C has better performance than C++ for scientific computing. I have written my codes in a very Object oriented style using STL , and using C++11 a lot. I am beginning to consider if I should start looking into pure C, if its going to run faster.

Any thoughts on this are welcome.

Haggle answered 20/8, 2013 at 18:33 Comment(8)
If you know the bounds ahead of time, and they are not going to change, then yes, an array is going to be much more performant than std::vector. Vector is a container that is used for dynamic arrays. It uses a resize policy that sacrifices memory in order to save memory allocations by allocating increasingly large chunks as the vector grows.Mikaela
Array is better but if you're not familiar with dynamic allocation, vector is easier to implement a random number of data items.Brigitte
The vector of vectors is more like a dynamically allocates array of pointers, each pointer pointing to a dynamically allocated array. The data are not contiguous. This may have consequences, which you have to measure. Also, there is a small size overhead which could be important if your vectors are small. The real equivalent would be std::array<std::array<double,10>, 10>.Quezada
"better performance" for what? Just allocating items? Not much in it. If you're using C++11, then converting that to C will be...challenging. For many applications, std::vector is a drop-in relacement for C arrays. What precisely do you want to be performant?Byword
Write some sample code both ways and time it. No other way to answer your question because it depends on so many things, compilers, libraries, problem space, and not least the abilities of the programmer.Deputize
And as usual with silly ~performance~ questions, the correct answer is to use neither, but to use specialised matrix data type instead ("why" left as an exercise for the reader — hint: it'll be better at processing your data; it'll also use your CPU to the full extent). See Eigen or Armadillo or something like that.Gate
It should also be noted that vector has a small overhead on random accessing because it checks the bounds to be able to throw an exception.Bolden
@Bolden It shouldn't do bounds checking (assuming you're using operator[]). .at() will bounds-check. If your compiler is doing bounds checking on [] then it's time to find another compiler. (Also assuming a non-debug mode of compiling)Irenairene
I
14

Given the abstraction it provides, the C++ std::vector is as efficient as it gets: 3 pointers on the stack, and dynamically allocated data that on average does 1 reallocation per element on a linear growth scenario (because the resizing expands the capacity more than proportionally, a factor of 1.5 to 2).

The C equivalent using malloc() and realloc() would be at least as expensive, and more cumbersome (manual resizing etc.). Moreover, the std::vector allows user-defined performance tuning through special allocators (pool-based, stack-allocated, etc.), which in C++11 are not as hard to use as it was in C++98.

If you don't need dynamic resizing, you can code both in C and C++ a static array (or std::array in C++).

In general, for high-performance computing, C++ has more potential for optimization, in particular through the use of function objects that can be inlined (in contrast to regular C function pointers). The canonical example is sorting

int comp( const void* a, const void* b ) {
    return /* your comparison here */;
}

// C style sorting
qsort( arr, LARGE_SIZE, sizeof( int ), comp ); 
                                       ^^^^ <---- no-inlining through function pointer

// C++11 style sorting (use hand-made function object for C++98
std::sort(std::begin(arr), std::end(arr), [](auto a, auto b) { 
    return comp(&a, &b);
           ^^^^ <----- C++11 lambdas can be fully inlined
});
Illegitimate answered 20/8, 2013 at 18:43 Comment(3)
+1 on C++11 stuff. The inlining is important, and std::array ftwByword
To be fair, std::vector leaves implementation with enough room to hang themselves with less than optimal solutions. As an example, insert(end(), b,e) isn't guaranteed to do the minimal number of resize/reserves if b and e are random access iterators (it is only guaranteed amortized).Drucilla
@Yakk sure, the old shoot yourself in the foot vs blow your leg off story. But the member range-insert of std::vector is guaranteed to be efficient, not just amortized as the non-member std::insert algorithmIllegitimate
S
9

The overhead of std::vector is:

  • 3 pointers on the stack
  • dynamic allocation (lazily, i.e. it doesn't allocate anything until required)

A stack-allocated array might be faster in some cases (for small amounts of data). For this you can use std::array<T, Length>.

If you need a 2-dimensional grid I would allocate the data in a single vector: std::vector<T>(width * height);. Then you can write a number of helper functions to obtain elements by x and y coordinates. (Or you can write a wrapper class.)

Supergalaxy answered 20/8, 2013 at 18:37 Comment(6)
Why neccessarily on the stack? vector<vector<int>> is the simplest counterexampleHols
Why do you say "for small amounts of data"? So is the std::array performance at par with "double 2dmatrix [10][10]"? I actually operate on VERY large matrices.Haggle
If you have a large amount of data then you should not allocate it on the stack because that might cause stack overflow. Also the performance advantage of std::array reduces as it gets larger.Supergalaxy
@atmaere: std::array is a very thin wrapper around a raw array, so the performance should be identical in optimized builds.Julian
@MooingDuck Actually the generated assembly code is identical when compiling with -O1 or higher.Supergalaxy
@StackedCrooked: That's what I said :DJulian
C
3

If you have no reason to resize an array, and know it's size during compilation (as you do in your first example), the better choice for STL templates is the std::array template. It provides you all the same benefits of a C-style array.

double 2dmatrix[10][10];

// would become

std::array<std::array<double, 10>, 10> 2dmatrix;
Corriveau answered 20/8, 2013 at 18:39 Comment(0)
H
3

If you know the sizes beforehand and the performance is a bottleneck - use std::array from C++11. It's performance is exactly the same as of C-style arrays because internally it looks like

template<typename T, int N>
struct array {
  T _data[N];
};

This is a preffered way to use stack-allocated arrays in modern C++. Never use C-style arrays if you have a modern compiler.

Hols answered 20/8, 2013 at 18:44 Comment(2)
@JustinMeiners: std::array is implemented with a T arr[N] member.Kamilah
Remember that this is only appropriate if your size is known to be small, like maybe 64kiB is a reasonable upper bound. In C++ the size is forced to be a compile-time constant, unlike C99 VLAs or alloca where it's easy to accidentally allow vulnerabilities like a stack clash (where a huge size can jump the stack pointer into some other memory region, past the guard pages).Hagen
A
3

People are going to say "It depends on what you're doing".

And they are right.

There's an example here where a conventionally-designed program using std::vector was performance tuned, through a series of six stages, and its execution time was reduced from 2700 microseconds per unit of work to 3.7, for a speedup factor of 730x.

The first thing done was to notice that a large percentage of time was going into growing arrays and removing elements from them. So a different array class was used, which reduced time by a large amount.

The second thing done was to notice that a large percentage of time was still going into array-related activities. So the arrays were eliminated altogether, and linked lists used instead, producing another large speedup.

Then other things were using a large percentage of the remaining time, such as newing and deleteing objects. Then those objects were recycled in free lists, producing another large speedup. After a couple more stages, a decision was made to stop trying, because it was getting harder to find things to improve, and the speedup was deemed sufficient.

The point is, don't just sort of choose something that's highly recommended and then hope for the best. Rather get it built one way or another and then do performance tuning like this, and be willing to make major changes in your data structure design, based on what you see a high percentage of time being spent on. And iterate it. You might change your storage scheme from A to B, and later from B to C. That's perfectly OK.

Amandaamandi answered 20/8, 2013 at 19:17 Comment(8)
How long ago was this? Because you'd have to have one hell of a specific problem for a linked-list to beat an array, even with suboptimal array usage, and I find it VERY hard to believe that you could beat the malloc implementation's free list cache with... your own free list cache.Pellet
@DeadMG: 1) How long ago is irrelevant. 2) Every problem is specific. 3) It is understandably hard to believe. Fortunately, that's irrelevant, because the source code, in all iterations, is here. If you give the tuning method a try, on a moderately big program, you could find yourself getting surprisingly large speedups, assuming speed is the goal, because (IME) most good size programs have multiple bottlenecks, in varying sizes, that together add up to nearly all the execution time.Amandaamandi
@DeadMG: Regarding the free-list cache: if each object contains a pointer used for linking them into a free-list, it certainly takes less than 10 instructions to push or pop an object on the list. That's going to be an order of magnitude less than using malloc or free, no matter what they do.Amandaamandi
How long ago is somewhat relevant. One problem with linked lists is load-use latency in traversing them, (vs. arrays where the next element position can be calculated without needing the contents of the previous, so there's more instruction-level parallelism for pipelined / out-of-order exec / memory-level parallelism to exploit). The newer the CPU, the bigger the difference in speed of traversing a linked list vs. array, and the more memory / cache bandwidth they have relative to latency. Linked lists have their uses when other work hides the latency, and you're not just looping over it.Hagen
@PeterCordes: You're absolutely right after a piece of software has gone through a series of tuning steps of the kind I've made a pest of myself trying to explain. Except for little toy programs, software can easily be massively inefficient for reasons its authors never guess. Rather they immediately start worrying about the latest issues they've learned, like instruction pipelining. Example: I showed some (very good otherwise) programmers how to get a speedup factor of 4 orders of magnitude. (They didn't believe me. :-) Once that's done, hardware-level issues certainly matter.Amandaamandi
@PeterCordes: If you're curious where that 4 orders of magnitude speedup came from, the context is a simulation of physiologic metabolism and elimination of pharmaceutical compounds and their metabolites. This was all written in a fairly straightforward and general way, with detailed stomach, intestine, liver, and kidney models, skin models, lung models, and so on, all with differential equations. Way overkill for any specific situation. I showed how rather than simulating they could generate problem-specific code, and compile and run it on the fly. Days to seconds speedup.Amandaamandi
@PeterCordes: Reasons for not doing that were a) Programmers felt it was too much work - well, Ok. b) Managers felt the customers would get suspicious - what have they been paying for all these years? ;-) It seems you can't do optimization unless you have competitors.Amandaamandi
Lol, I hadn't previously heard of optimizations that were not done (until competitve pressure) because it would make your past self look bad. :PHagen
K
1

In scientific computing, bugs and sub-optimal code are specially frustrating because large amounts of data are incorrectly processed and precious time is wasted.

std::vector might be your bottleneck or your best performer depending on your knowledge of its inner workings. Pay special attention to reserve(), insert(), erase(); consider learning about alignment and processor caching if your program is threaded.

Think about the time you have to spend ensuring consistency -and later hunting for bugs- if you try to do all the memory management by yourself, particularly when you are progressively adding features to your software. At the end of the day, the overhead of std::vector will be the least of your problems.

Koser answered 21/8, 2013 at 3:17 Comment(0)
N
0

For scientific computing you'd be much better off using a dedicated C++ matrix library, such as Armadillo. This not only gives you fast array processing, but also many linear algebra operations that have been thoroughly debugged.

Apart from performance reasons, using a dedicated C++ matrix library will also allow you to considerably reduce the verbosity of your code, make less mistakes, and hence speed up development. One example is that with a C++ matrix library you don't need to worry about memory management.

Lastly, if you really need to go low-level (ie. use memory directly via pointers), C++ allows you to "drop" down to the C level. In Armadillo this is done via the .memptr() member function.

Neolatin answered 21/8, 2013 at 3:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.