Will instantiating templates in precompiled headers reduce compile times?
Asked Answered
H

3

11

Example: Say I include in my precompiled header file:

#include <vector>

As a few instances of the vector, such as std::vector, std::vector etc are used often in my project, will it reduce compile time if I instantiate them as well in the precomiled header like this:

#include <vector>
template class std::vector<float>;
template class std::vector<int>;

Going further, will it make sense to even add dummy functions to the precompiled headers which uses a few functions:

namespace pch_detail {
inline auto func() {
  auto&& v = std::vector<float>{};
  v.size();
  v.begin();
  v.front();
}
}

I'm a very unsure of of how translation units and templates really work, so it seems to me if I instantiate them in the precompiled headers, it should mean that they do not need to be instantiated for every .cpp file.

Update

Tested on a real-world code base with Visual Studio 2017 and some instantiations of commonly used template classes.

  1. With common templated class instantiated: 71731 ms
  2. Without instantiation: 68544 ms

Hence, at least in my case, it took slightly took more time.

Hemolysis answered 28/7, 2017 at 9:40 Comment(4)
Is compile time really a problem you need to solve? Last time I had such a problem was about 1993.Faxen
Yes it is, and it has been at every company I've ever worked at. And also every company using C++ which I've heard of.Hemolysis
@EJP Wow man, I want your life!Groce
@Groce He might be form the future, or perhaps a parallel universeHemolysis
D
6

It can make a difference yes.

Instantiation in translation units can then exploit data in the precompiled header, and a compiler can read that more quickly than the C++ standard library headers.

But you will have to maintain a list of instantiations, so this compile-time optimisation might be more trouble than it's worth - your idea could end up having the opposite effect if you have instantiations that are no longer needed.

Destruction answered 28/7, 2017 at 9:44 Comment(0)
M
5

Funny thing, but at least for clang (4.0.1) your variant increase compile time:

1. no pch

real    0m0,361s
user    0m0,340s
sys     0m0,021s

2. pch, no explicit instantiate

real    0m0,297s
user    0m0,280s
sys     0m0,017s

3. pch, explicit instantiate

real    0m0,507s
user    0m0,474s
sys     0m0,033s

I use such code:

#include <iostream>
#include "test.h"

int main() {
        std::vector<float> a = {1., 2., 3.};
        for (auto &&e : a) {
                std::cout << e << "\n";
        }
        std::vector<int> b = {1, 2, 3};
        for (auto &&e : b) {
                std::cout << e << "\n";
        }
}

case 2 test.h

#pragma once

#include <vector>

case 3

#pragma once

#include <vector>
template class std::vector<float>;
template class std::vector<int>;

and such compilation script:

echo "no pch"
time clang++ -std=c++11 main.cpp

echo "pch, no explicit instantiate"
clang++ -std=c++11 -x c++-header test.h -o test.pch
time clang++ -std=c++11 -include-pch  test.pch main.cpp 

echo "pch, explicit instantiate"
clang++ -std=c++11 -x c++-header test2.h -o test2.pch
time clang++ -std=c++11 -include-pch  test2.pch main2.cpp 
Maryland answered 28/7, 2017 at 9:58 Comment(6)
A one-time use of a precompiled header is an abuse of the idea, but this is a nice analysis. Have an upvote!Destruction
But what if you use 1000 .cpp files?Hemolysis
@ViktorSehr Compilation of .cpp files is separate process, so just multiply number on 1000, divide on number of cores and you get numbers.Maryland
@Viktor - So, the use of vector takes 0.5-1 s of compile time. Perhaps the problem is that you have to compile 1000 files, and that is what you should look into?Munch
@fghj: Ah, I misread your script, I thought the time included the pch compilationHemolysis
But for some other cases, I found a different result: specialization class member function may speed-up/no-change the compile phase. (Using clang++ -ftime-trace to check the result)Airsickness
D
1

I've also been thinking about this way and I have this question in my mind too. (But I'm noob...)

Another reference: https://msdn.microsoft.com/en-us/library/by56e477.aspx

Maybe explicit extern is needed?

However, when it's time to link, cpp files has been compiled into .obj's, but .pch is not a .obj... Then, where will the instantiation of the template functions be? Will the linker be able to read things from the .pch?

Or we need another separate .cpp dedicated for instantiating them, while declaring all client references as extern?

And.. Link-Time Code Generation?

Had some try

It works a little. Testing with VS2012. Turn On compiler profiling and watch the compiler output.

// stdafx.h
#pragma once

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>
#include <stdlib.h>

#include <vector>
#include <set>
#include <deque>

// stdafx.cpp
#include "stdafx.h"

using namespace std;

template class set<int>;
template set<int>::set();
template set<int>::_Pairib set<int>::insert(const int&);

template class deque<int>;
template deque<int>::deque();
template void deque<int>::push_back(const int&);

template class vector<int>;
template vector<int>::vector();
template void vector<int>::push_back(const int&);

// playcpp.cpp, the entry point

#include "stdafx.h"

using namespace std;
// toggle this block of code
// change a space in the "printf", then build (incrementally)
/*
extern template class set<int>;
extern template set<int>::set();
extern template set<int>::_Pairib set<int>::insert(const int&);

extern template class deque<int>;
extern template deque<int>::deque();
extern template void deque<int>::push_back(const int&);

extern template class vector<int>;
extern template vector<int>::vector();
extern template void vector<int>::push_back(const int&);
*/

int _tmain(int argc, _TCHAR* argv[])
{
    set<int> s;
    deque<int> q;
    vector<int> v;
    for(int i=0;i<10000;i++){
        int choice=rand()%3;
        int value=rand()%100;
        switch(choice){
        case 0: s.insert(value); break;
        case 1: q.push_back(value); break;
        case 2: v.push_back(value); break;
        }
    }
    for(const auto &i:s)
        printf("%d",i);
    for(const auto &i:q)
        printf("%d ",i);
    for(const auto &i:v)
        printf("%d ",i);
    return 0;
}

results (lots of others omitted)

with extern declarations:

1>               1630 毫秒  Build                                      1 次调用
...
1>      757 毫秒  ClCompile                                  1 次调用
1>      787 毫秒  Link                                       1 次调用

without extern declarations:

1>               1801 毫秒  Build                                      1 次调用
...
1>      774 毫秒  Link                                       1 次调用
1>      955 毫秒  ClCompile                                  1 次调用

(Chinese version. Legends: 毫秒:ms / milliseconds,x 次调用:x Calls / called x times)

The power settings are adjusted to let the CPU run slow in order to get longer time to avoid turbulence.

Above are just one sample for each case. Still, it's quite unstable. Both cases may sometimes run for ~200ms more.

But trying many times, there's always about 200ms difference on averge. I can just tell that the averages are around 1650ms and 1850ms, with all difference on ClCompile's time.

Of course there are more calls to other template member functions used, just I didn't have the time to figure out all those type signatures... (could anyone tell me which (const) iterator it will use?)

Well but then.... Are there better ways of doing it?

Drift answered 2/5, 2018 at 12:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.