push_back() and emplace_back() behind the scenes

Asked 5/6, 2018 at 17:13 Answered 7/7, 2018 at 1:21

Solved c++c++11 vector std move-semantics

I'm currently learning C++ on my own, and I am curious about how push_back() and emplace_back() work under the hood. I've always assumed that emplace_back() is faster when you are trying to construct and push a large object to the back of a container, like a vector.

Let's suppose I have a Student object that I want to append to the back of a vector of Students.

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           /* initialize member variables */ { }
};

Suppose I call push_back() and push a Student object to the end of a vector:

vector<Student> vec;
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));

My understanding here is that push_back creates an instance of the Student object outside of the vector and then moves it to the back of the vector.

Diagram:

I can also emplace instead of push:

vector<Student> vec;
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);

My understanding here is that the Student object is constructed at the very back of the vector so that no moving is required.

Diagram:

Thus, it would make sense that emplacing would be faster, especially if many Student objects are added. However, when I timed these two versions of code:

for (int i = 0; i < 10000000; ++i) {
    vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
}

and

for (int i = 0; i < 10000000; ++i) {
    vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
}

I expected the latter to be faster, since the large Student object wouldn't have to be moved. Oddly enough, the emplace_back version ended up being slower (across multiple attempts). I also tried inserting 10000000 Student objects, where the constructor takes in references and the arguments in push_back() and emplace_back() are stored in variables. This also didn't work, as emplace was still slower.

I've checked to make sure that I'm inserting the same number of objects in both cases. The time difference isn't too large, but emplacing ended up slower by a few seconds.

Is there something wrong with my understanding of how push_back() and emplace_back() work? Thank you very much for your time!

Here's the code, as requested. I'm using the g++ compiler.

Push back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
    return 0;
}

Emplace back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
    return 0;
}

Oxyacetylene answered 5/6, 2018 at 17:13 Comment(15)

Compiler? Compilation options? Compilable code? – Biogen 5/6, 2018 at 17:16

One difference is that the code has to pass 7 parameters to emplace_back, but only one to push_back. – Valuable 5/6, 2018 at 17:52

@KillzoneKid push_back can take a right-reference and move, but it is moving the temporary constructed object. – Maxa 5/6, 2018 at 17:53

Unless you used reserve() on your vector, for large counts the costs of moving all the existing elements each time your vector grows will swamp the difference in cost of insertion, I would think! – Maxa 5/6, 2018 at 17:54

@GemTaylor So does push_back create a temporary constructed object and move it to the back of the vector? Also, I added the reserve() line and the times seem more reasonable. Thank you! – Oxyacetylene 5/6, 2018 at 17:59

In the push_back case, the optimizer might recognize that the temporary Student object is loop invariant, so it could be hoisted out of the loop, constructing it just once, and copying it over and over into the vector. In the emplace_back case, you have to construct a new one each time, which may involve creating std::strings from the string literals each time. – Interne 5/6, 2018 at 18:13

@AdrianMcCarthy Would randomly generating Student objects with different member variable values fix this? – Oxyacetylene 5/6, 2018 at 18:27

@Oxyacetylene You need to post the compiler options you used, not just the compiler name. If you are timing an unoptimized build, the timings you're reporting are meaningless. – Knifeedged 5/6, 2018 at 18:53

@Oxyacetylene In vec.push_back(Student("Bob"... Student() creates a temporary object, and push_back() moves it, if there is a useful move operator. For most "large" objects, their content is something that can be usefully moved (eg std::string or nested containers), and the cost compared to emplace is marginal. I think @Adrian might be onto something as well. If the object can't be moved, then it could be constructed just once and some processing saved. – Maxa 5/6, 2018 at 19:3

@KillzoneKid Yes, I think that might be it. I'm using C++11, and the C++11 results look a lot like mine. – Oxyacetylene 5/6, 2018 at 19:30

@GemTaylor So if an object can be usefully moved, even if it is large, emplace doesn't give too big of an advantage? Do you have an example where the difference between pushing and emplacing is large? Most of the examples I'm testing have rather similar times for both push and emplace. Thanks! – Oxyacetylene 5/6, 2018 at 19:42

@KillzoneKid These unoptimized benchmark results are utterly meaningless. – Kerf 5/6, 2018 at 20:41

@Oxyacetylene If your object has very many (1KB+) small members, then move is not useful. If the constructor for the object only takes a few parameters, and the rest calculated,then emplace might be faster. But most object models for larger objects are basically pimpl, and move of a pimpl is trivial. Trying to think of an example, but it isn't real world, perhaps your object caches the next 1000 random numbers from some input seed, and holds them in a c array int[1000]. Move can't do anything with that, but you should be pointing to it with unique_ptr<int[]>, and move of the pointer is trivial. – Maxa 6/6, 2018 at 9:55

@Oxyacetylene Simply respect your effort to write a such a well organised question. – Henceforth 6/6, 2018 at 21:25

@KillzoneKid Google Benchmark varies the amount of iterations it uses, so not reseting the vector on each loop is a really bad idea. Instead, you should put the code in the loop and use benchmark::DoNotOptimize or benchmark::ClobberMemory (when appropriate) to prevent the compiler from optimizing the variable away. Like so: quick-bench.com/5tQMkMwC5-j4nmUYSsqcmWeNEPs (note that optimizations are on) – Jerky 6/7, 2018 at 21:47

This behavior is due to the complexity of std::string. There are a couple things interacting here:

The Small String Optimization (SSO)
In the push_back version, the compiler is able to determine the length of the string at compile-time, whereas the compiler was unable to do so for the emplace_back version. Thus, the emplace_back call requires calls to strlen. Furthermore, since the compiler doesn't know the length of the string literal, it has to emit code for both the SSO and non-SSO cases (see Jason Turner's "Initializer Lists Are Broken, Let's Fix Them"; it's a long talk, but he follows the problem of inserting strings into a vector throughout it)

Consider this simpler type:

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{a}
    , b{b}
    , c{c}
  {}
};

Note how the constructor copies a, b, and c.

Testing this against a baseline of just allocating memory, we can see that push_back outperforms emplace_back:

^{Click on image for quick-bench link}

Because the strings in your example all fit inside the SSO buffer, copying is just as cheap as moving in this case. Thus, the constructor is perfectly efficient, and the improvements from emplace_back have a smaller effect.

Also, if we search the assembly for both a call to push_back and a call to emplace_back:

// push_back call
void foo(std::vector<type>& vec) {
    vec.push_back({"Bob", "pizza", "Smith"});
}

// emplace_back call
void foo(std::vector<type>& vec) {
    vec.emplace_back("Bob", "pizza", "Smith");
}

(Assembly not copied here. It's massive. std::string is complicated)

We can see that emplace_back has calls to strlen, whereas push_back does not. Since the distance between the string literal and the std::string being constructed is increased, the compiler was unable to optimize out the call to strlen.

Explicitly calling the std::string constructor would remove the calls to strlen, but would no longer construct them in place, so that doesn't work to speed up emplace_back.

All this said, if we leave the SSO by using long enough strings, the allocation cost completely drowns out these details, so both emplace_back and push_back have the same performance:

^{Click on image for quick-bench link}

If you fix the constructor of type to move its arguments, emplace_back becomes faster in all cases.

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{std::move(a)}
    , b{std::move(b)}
    , c{std::move(c)}
  {}
};

SSO case

^{Click on image for quick-bench link}

Long case

^{Click on image for quick-bench link}

However, the SSO push_back case slowed down; the compiler seems to emit extra copies.

The optimal version of perfect forwarding does not suffer from this drawback (note the scale change on the vertical axis):

struct type {
  std::string a;
  std::string b;
  std::string c;

  template <typename A, typename B, typename C>
  type(A&& a, B&& b, C&& c)
    : a{std::forward<A>(a)}
    , b{std::forward<B>(b)}
    , c{std::forward<C>(c)}
  {}
};

^{Click on image for quick-bench link}

Jerky answered 7/7, 2018 at 1:21 Comment(3)

This is a neat answer, but that observation that emplace_back is making strlen calls is super interesting. – Equitable 7/7, 2018 at 3:34

@Equitable More like the emplace_back prevents the compiler from optimizing out the strlen, but yeah – Jerky 7/7, 2018 at 3:37

You've inspired me to open a new question regarding emplace_back and strlen. – Equitable 7/7, 2018 at 3:54

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags