push_back() and emplace_back() behind the scenes
Asked Answered
O

1

12

I'm currently learning C++ on my own, and I am curious about how push_back() and emplace_back() work under the hood. I've always assumed that emplace_back() is faster when you are trying to construct and push a large object to the back of a container, like a vector.

Let's suppose I have a Student object that I want to append to the back of a vector of Students.

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           /* initialize member variables */ { }
};

Suppose I call push_back() and push a Student object to the end of a vector:

vector<Student> vec;
vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));

My understanding here is that push_back creates an instance of the Student object outside of the vector and then moves it to the back of the vector.

Diagram: https://ibb.co/hV6Jho

I can also emplace instead of push:

vector<Student> vec;
vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);

My understanding here is that the Student object is constructed at the very back of the vector so that no moving is required.

Diagram: enter image description here

Thus, it would make sense that emplacing would be faster, especially if many Student objects are added. However, when I timed these two versions of code:

for (int i = 0; i < 10000000; ++i) {
    vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
}

and

for (int i = 0; i < 10000000; ++i) {
    vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
}

I expected the latter to be faster, since the large Student object wouldn't have to be moved. Oddly enough, the emplace_back version ended up being slower (across multiple attempts). I also tried inserting 10000000 Student objects, where the constructor takes in references and the arguments in push_back() and emplace_back() are stored in variables. This also didn't work, as emplace was still slower.

I've checked to make sure that I'm inserting the same number of objects in both cases. The time difference isn't too large, but emplacing ended up slower by a few seconds.

Is there something wrong with my understanding of how push_back() and emplace_back() work? Thank you very much for your time!

Here's the code, as requested. I'm using the g++ compiler.

Push back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.push_back(Student("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997));
    return 0;
}

Emplace back:

struct Student {
   string name;
   int student_ID;
   double GPA;
   string favorite_food;
   string favorite_prof;
   int hours_slept;
   int birthyear;
   Student(string name_in, int ID_in, double GPA_in, string food_in, 
           string prof_in, int sleep_in, int birthyear_in) :
           name(name_in), student_ID(ID_in), GPA(GPA_in), 
           favorite_food(food_in), favorite_prof(prof_in),
           hours_slept(sleep_in), birthyear(birthyear_in) {}
};

int main() {
    vector<Student> vec;
    vec.reserve(10000000);
    for (int i = 0; i < 10000000; ++i) 
         vec.emplace_back("Bob", 123456, 3.89, "pizza", "Smith", 7, 1997);
    return 0;
}
Oxyacetylene answered 5/6, 2018 at 17:13 Comment(15)
Compiler? Compilation options? Compilable code?Biogen
One difference is that the code has to pass 7 parameters to emplace_back, but only one to push_back.Valuable
@KillzoneKid push_back can take a right-reference and move, but it is moving the temporary constructed object.Maxa
Unless you used reserve() on your vector, for large counts the costs of moving all the existing elements each time your vector grows will swamp the difference in cost of insertion, I would think!Maxa
@GemTaylor So does push_back create a temporary constructed object and move it to the back of the vector? Also, I added the reserve() line and the times seem more reasonable. Thank you!Oxyacetylene
In the push_back case, the optimizer might recognize that the temporary Student object is loop invariant, so it could be hoisted out of the loop, constructing it just once, and copying it over and over into the vector. In the emplace_back case, you have to construct a new one each time, which may involve creating std::strings from the string literals each time.Interne
@AdrianMcCarthy Would randomly generating Student objects with different member variable values fix this?Oxyacetylene
@Oxyacetylene You need to post the compiler options you used, not just the compiler name. If you are timing an unoptimized build, the timings you're reporting are meaningless.Knifeedged
@Oxyacetylene In vec.push_back(Student("Bob"... Student() creates a temporary object, and push_back() moves it, if there is a useful move operator. For most "large" objects, their content is something that can be usefully moved (eg std::string or nested containers), and the cost compared to emplace is marginal. I think @Adrian might be onto something as well. If the object can't be moved, then it could be constructed just once and some processing saved.Maxa
@KillzoneKid Yes, I think that might be it. I'm using C++11, and the C++11 results look a lot like mine.Oxyacetylene
@GemTaylor So if an object can be usefully moved, even if it is large, emplace doesn't give too big of an advantage? Do you have an example where the difference between pushing and emplacing is large? Most of the examples I'm testing have rather similar times for both push and emplace. Thanks!Oxyacetylene
@KillzoneKid These unoptimized benchmark results are utterly meaningless.Kerf
@Oxyacetylene If your object has very many (1KB+) small members, then move is not useful. If the constructor for the object only takes a few parameters, and the rest calculated,then emplace might be faster. But most object models for larger objects are basically pimpl, and move of a pimpl is trivial. Trying to think of an example, but it isn't real world, perhaps your object caches the next 1000 random numbers from some input seed, and holds them in a c array int[1000]. Move can't do anything with that, but you should be pointing to it with unique_ptr<int[]>, and move of the pointer is trivial.Maxa
@Oxyacetylene Simply respect your effort to write a such a well organised question.Henceforth
@KillzoneKid Google Benchmark varies the amount of iterations it uses, so not reseting the vector on each loop is a really bad idea. Instead, you should put the code in the loop and use benchmark::DoNotOptimize or benchmark::ClobberMemory (when appropriate) to prevent the compiler from optimizing the variable away. Like so: quick-bench.com/5tQMkMwC5-j4nmUYSsqcmWeNEPs (note that optimizations are on)Jerky
J
9

This behavior is due to the complexity of std::string. There are a couple things interacting here:

  • The Small String Optimization (SSO)
  • In the push_back version, the compiler is able to determine the length of the string at compile-time, whereas the compiler was unable to do so for the emplace_back version. Thus, the emplace_back call requires calls to strlen. Furthermore, since the compiler doesn't know the length of the string literal, it has to emit code for both the SSO and non-SSO cases (see Jason Turner's "Initializer Lists Are Broken, Let's Fix Them"; it's a long talk, but he follows the problem of inserting strings into a vector throughout it)

Consider this simpler type:

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{a}
    , b{b}
    , c{c}
  {}
};

Note how the constructor copies a, b, and c.

Testing this against a baseline of just allocating memory, we can see that push_back outperforms emplace_back:

enter image description here

Click on image for quick-bench link

Because the strings in your example all fit inside the SSO buffer, copying is just as cheap as moving in this case. Thus, the constructor is perfectly efficient, and the improvements from emplace_back have a smaller effect.

Also, if we search the assembly for both a call to push_back and a call to emplace_back:

// push_back call
void foo(std::vector<type>& vec) {
    vec.push_back({"Bob", "pizza", "Smith"});
}
// emplace_back call
void foo(std::vector<type>& vec) {
    vec.emplace_back("Bob", "pizza", "Smith");
}

(Assembly not copied here. It's massive. std::string is complicated)

We can see that emplace_back has calls to strlen, whereas push_back does not. Since the distance between the string literal and the std::string being constructed is increased, the compiler was unable to optimize out the call to strlen.

Explicitly calling the std::string constructor would remove the calls to strlen, but would no longer construct them in place, so that doesn't work to speed up emplace_back.

All this said, if we leave the SSO by using long enough strings, the allocation cost completely drowns out these details, so both emplace_back and push_back have the same performance:

enter image description here

Click on image for quick-bench link


If you fix the constructor of type to move its arguments, emplace_back becomes faster in all cases.

struct type {
  std::string a;
  std::string b;
  std::string c;

  type(std::string a, std::string b, std::string c)
    : a{std::move(a)}
    , b{std::move(b)}
    , c{std::move(c)}
  {}
};

SSO case

enter image description here

Click on image for quick-bench link

Long case

enter image description here

Click on image for quick-bench link

However, the SSO push_back case slowed down; the compiler seems to emit extra copies.

The optimal version of perfect forwarding does not suffer from this drawback (note the scale change on the vertical axis):

struct type {
  std::string a;
  std::string b;
  std::string c;

  template <typename A, typename B, typename C>
  type(A&& a, B&& b, C&& c)
    : a{std::forward<A>(a)}
    , b{std::forward<B>(b)}
    , c{std::forward<C>(c)}
  {}
};

enter image description here

Click on image for quick-bench link

Jerky answered 7/7, 2018 at 1:21 Comment(3)
This is a neat answer, but that observation that emplace_back is making strlen calls is super interesting.Equitable
@Equitable More like the emplace_back prevents the compiler from optimizing out the strlen, but yeahJerky
You've inspired me to open a new question regarding emplace_back and strlen.Equitable

© 2022 - 2024 — McMap. All rights reserved.