Explicit instantiation allows reducing compile times and output sizes
These are the major gains it can provide. They come from the following two effects described in detail in the sections below:
- remove definitions from headers to prevent intelligent build systems from rebuilding includers on every change to those templates (saves time)
- prevent object redefinition (saves time and size)
Remove definitions from headers
Explicit instantiation allows you to leave definitions in the .cpp file.
When the definition is on the header and you modify it, an intelligent build system would recompile all includers, which could be dozens of files, possibly making incremental re-compilation after a single file change unbearably slow.
Putting definitions in .cpp files does have the downside that external libraries can't reuse the template with their own new classes, but "Remove definitions from included headers but also expose templates an external API" below shows a workaround.
See concrete examples below.
Examples of build systems that detect includes and rebuild:
Object redefinition gains: understanding the problem
If you just completely define a template on a header file, every single compilation unit that includes that header ends up compiling its own implicit copy of the template for every different template argument usage made.
This means a lot of useless disk usage and compilation time.
Here is a concrete example, in which both main.cpp
and notmain.cpp
implicitly define MyTemplate<int>
due to its usage in those files.
main.cpp
#include <iostream>
#include "mytemplate.hpp"
#include "notmain.hpp"
int main() {
std::cout << notmain() + MyTemplate<int>().f(1) << std::endl;
}
notmain.cpp
#include "mytemplate.hpp"
#include "notmain.hpp"
int notmain() { return MyTemplate<int>().f(1); }
mytemplate.hpp
#ifndef MYTEMPLATE_HPP
#define MYTEMPLATE_HPP
template<class T>
struct MyTemplate {
T f(T t) { return t + 1; }
};
#endif
notmain.hpp
#ifndef NOTMAIN_HPP
#define NOTMAIN_HPP
int notmain();
#endif
GitHub upstream.
Compile and view symbols with nm
:
g++ -c -Wall -Wextra -std=c++11 -pedantic-errors -o notmain.o notmain.cpp
g++ -c -Wall -Wextra -std=c++11 -pedantic-errors -o main.o main.cpp
g++ -Wall -Wextra -std=c++11 -pedantic-errors -o main.out notmain.o main.o
echo notmain.o
nm -C -S notmain.o | grep MyTemplate
echo main.o
nm -C -S main.o | grep MyTemplate
Output:
notmain.o
0000000000000000 0000000000000017 W MyTemplate<int>::f(int)
main.o
0000000000000000 0000000000000017 W MyTemplate<int>::f(int)
So we see that a separate section is generated for every single method instantiation, and that each of of them takes of course space in the object files.
From man nm
, we see that W
means weak symbol, which GCC chose because this is a template function.
The reason it doesn't blow up at link time with multiple definitions is that the linker accepts multiple weak definitions, and just picks one of them to put in the final executable, and all of them are the same in our case, so all is fine.
The numbers in the output mean:
0000000000000000
: address within section. This zero is because templates are automatically put into their own section
0000000000000017
: size of the code generated for them
We can see this a bit more clearly with:
objdump -S main.o | c++filt
which ends in:
Disassembly of section .text._ZN10MyTemplateIiE1fEi:
0000000000000000 <MyTemplate<int>::f(int)>:
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 89 75 f4 mov %esi,-0xc(%rbp)
f: 8b 45 f4 mov -0xc(%rbp),%eax
12: 83 c0 01 add $0x1,%eax
15: 5d pop %rbp
16: c3 retq
and _ZN10MyTemplateIiE1fEi
is the mangled name of MyTemplate<int>::f(int)>
which c++filt
decided not to unmangle.
Solutions to the object redefinition problem
This problem can be avoided by using explicit instantiation and either:
keep definition on hpp and add extern template
on hpp for types which are going to be explicitly instantiated.
As explained at: using extern template (C++11) extern template
prevents a completely defined template from being instantiated by compilation units, except for our explicit instantiation. This way, only our explicit instantiation will be defined in the final objects:
mytemplate.hpp
#ifndef MYTEMPLATE_HPP
#define MYTEMPLATE_HPP
template<class T>
struct MyTemplate {
T f(T t) { return t + 1; }
};
extern template class MyTemplate<int>;
#endif
mytemplate.cpp
#include "mytemplate.hpp"
// Explicit instantiation required just for int.
template class MyTemplate<int>;
main.cpp
#include <iostream>
#include "mytemplate.hpp"
#include "notmain.hpp"
int main() {
std::cout << notmain() + MyTemplate<int>().f(1) << std::endl;
}
notmain.cpp
#include "mytemplate.hpp"
#include "notmain.hpp"
int notmain() { return MyTemplate<int>().f(1); }
Downsides:
- the definition stays in the header, making single file change recompiles to that header possibly slow
- if you are header only library, you force external projects to do their own explicit instantiation. If you are not a header-only library, this solution is likely the best.
- if the template type is defined in your own project and not a built-in like
int
, it seems that you are forced to add the include for it on the header, a forward declaration is not enough: extern template & incomplete types This increases header dependencies a bit.
moving the definition on the cpp file, leave only declaration on hpp, i.e. modify the original example to be:
mytemplate.hpp
#ifndef MYTEMPLATE_HPP
#define MYTEMPLATE_HPP
template<class T>
struct MyTemplate {
T f(T t);
};
#endif
mytemplate.cpp
#include "mytemplate.hpp"
template<class T>
T MyTemplate<T>::f(T t) { return t + 1; }
// Explicit instantiation.
template class MyTemplate<int>;
Downside: external projects can't use your template with their own types. Also you are forced to explicitly instantiate all types. But maybe this is an upside since then programmers won't forget.
keep definition on hpp and add extern template
on every includer:
mytemplate.cpp
#include "mytemplate.hpp"
// Explicit instantiation.
template class MyTemplate<int>;
main.cpp
#include <iostream>
#include "mytemplate.hpp"
#include "notmain.hpp"
// extern template declaration
extern template class MyTemplate<int>;
int main() {
std::cout << notmain() + MyTemplate<int>().f(1) << std::endl;
}
notmain.cpp
#include "mytemplate.hpp"
#include "notmain.hpp"
// extern template declaration
extern template class MyTemplate<int>;
int notmain() { return MyTemplate<int>().f(1); }
Downside: all includers have to add the extern
to their CPP files, which programmers will likely forget to do.
With any of those solutions, nm
now contains:
notmain.o
U MyTemplate<int>::f(int)
main.o
U MyTemplate<int>::f(int)
mytemplate.o
0000000000000000 W MyTemplate<int>::f(int)
so we see have only mytemplate.o
has a compilation of MyTemplate<int>
as desired, while notmain.o
and main.o
don't because U
means undefined.
Remove definitions from included headers but also expose templates an external API in a header-only library
If your library is not header only, the extern template
method will work, since using projects will just link to your object file, which will contain the object of the explicit template instantiation.
However, for header only libraries, if you want to both:
- speed up your project's compilation
- expose headers as an external library API for others to use it
then you can try one of the following:
-
mytemplate.hpp
: template definition
mytemplate_interface.hpp
: template declaration only matching the definitions from mytemplate_interface.hpp
, no definitions
mytemplate.cpp
: include mytemplate.hpp
and make explicit instantitations
main.cpp
and everywhere else in the code base: include mytemplate_interface.hpp
, not mytemplate.hpp
-
mytemplate.hpp
: template definition
mytemplate_implementation.hpp
: includes mytemplate.hpp
and adds extern
to every class that will be instantiated
mytemplate.cpp
: include mytemplate.hpp
and make explicit instantitations
main.cpp
and everywhere else in the code base: include mytemplate_implementation.hpp
, not mytemplate.hpp
Or even better perhaps for multiple headers: create an intf
/impl
folder inside your includes/
folder and use mytemplate.hpp
as the name always.
The mytemplate_interface.hpp
approach looks like this:
mytemplate.hpp
#ifndef MYTEMPLATE_HPP
#define MYTEMPLATE_HPP
#include "mytemplate_interface.hpp"
template<class T>
T MyTemplate<T>::f(T t) { return t + 1; }
#endif
mytemplate_interface.hpp
#ifndef MYTEMPLATE_INTERFACE_HPP
#define MYTEMPLATE_INTERFACE_HPP
template<class T>
struct MyTemplate {
T f(T t);
};
#endif
mytemplate.cpp
#include "mytemplate.hpp"
// Explicit instantiation.
template class MyTemplate<int>;
main.cpp
#include <iostream>
#include "mytemplate_interface.hpp"
int main() {
std::cout << MyTemplate<int>().f(1) << std::endl;
}
Compile and run:
g++ -c -Wall -Wextra -std=c++11 -pedantic-errors -o mytemplate.o mytemplate.cpp
g++ -c -Wall -Wextra -std=c++11 -pedantic-errors -o main.o main.cpp
g++ -Wall -Wextra -std=c++11 -pedantic-errors -o main.out main.o mytemplate.o
Output:
2
Tested in Ubuntu 18.04.
C++20 modules
https://en.cppreference.com/w/cpp/language/modules
I think this feature will provide the best setup going forward as it becomes available, but I haven't checked it yet because it is not yet available on my GCC 9.2.1.
You will still have to do explicit instantiation to get the speedup/disk saving, but at least we will have a sane solution for "Remove definitions from included headers but also expose templates an external API" which does not require copying things around 100 times.
Expected usage (without the explicit insantiation, not sure what the exact syntax will be like, see: How to use template explicit instantiation with C++20 modules?) be something along:
helloworld.cpp
export module helloworld; // module declaration
import <iostream>; // import declaration
template<class T>
export void hello(T t) { // export declaration
std::cout << t << std::end;
}
main.cpp
import helloworld; // import declaration
int main() {
hello(1);
hello("world");
}
and then compilation mentioned at https://quuxplusone.github.io/blog/2019/11/07/modular-hello-world/
clang++ -std=c++2a -c helloworld.cpp -Xclang -emit-module-interface -o helloworld.pcm
clang++ -std=c++2a -c -o helloworld.o helloworld.cpp
clang++ -std=c++2a -fprebuilt-module-path=. -o main.out main.cpp helloworld.o
So from this we see that clang can extract the template interface + implementation into the magic helloworld.pcm
, which must contain some LLVM intermediate representation of the source: How are templates handled in C++ module system? which still allows for template specification to happen.
How to quickly analyze your build to see if it would gain a lot from template instantiation
So, you've got a complex project and you want to decide if template instantiation will bring significant gains without actually doing the full refactor?
The analysis below might help you decide, or at least select the most promising objects to refactor first while you experiment, by borrowing some ideas from: My C++ object file is too big
# List all weak symbols with size only, no address.
find . -name '*.o' | xargs -I{} nm -C --size-sort --radix d '{}' |
grep ' W ' > nm.log
# Sort by symbol size.
sort -k1 -n nm.log -o nm.sort.log
# Get a repetition count.
uniq -c nm.sort.log > nm.uniq.log
# Find the most repeated/largest objects.
sort -k1,2 -n nm.uniq.log -o nm.uniq.sort.log
# Find the objects that would give you the most gain after refactor.
# This gain is calculated as "(n_occurences - 1) * size" which is
# the size you would gain for keeping just a single instance.
# If you are going to refactor anything, you should start with the ones
# at the bottom of this list.
awk '{gain = ($1 - 1) * $2; print gain, $0}' nm.uniq.sort.log |
sort -k1 -n > nm.gains.log
# Total gain if you refactored everything.
awk 'START{sum=0}{sum += $1}END{print sum}' nm.gains.log
# Total size. The closer total gain above is to total size, the more
# you would gain from the refactor.
awk 'START{sum=0}{sum += $1}END{print sum}' nm.log
The dream: a template compiler cache
I think the ultimate solution would be if we could build with:
g++ --template-cache myfile.o file1.cpp
g++ --template-cache myfile.o file2.cpp
and then myfile.o
would automatically reuse previously compiled templates across files.
This would mean 0 extra effort on the programmers besides passing that extra CLI option to your build system.
A secondary bonus of explicit template instantiation: help IDEs list template instantiations
I've found that some IDEs such as Eclipse cannot resolve "a list of all template instantiations used".
So e.g., if you are inside a templated code, and you want to find possible values of the template, you would have to find the constructor usages one by one and deduce the possible types one by one.
But on Eclipse 2020-03 I can easily list explicitly instantiated templates by doing a Find all usages (Ctrl + Alt + G) search on the class name, which points me e.g. from:
template <class T>
struct AnimalTemplate {
T animal;
AnimalTemplate(T animal) : animal(animal) {}
std::string noise() {
return animal.noise();
}
};
to:
template class AnimalTemplate<Dog>;
Here's a demo: https://github.com/cirosantilli/ide-test-projects/blob/e1c7c6634f2d5cdeafd2bdc79bcfbb2057cb04c4/cpp/animal_template.hpp#L15
Another guerrila technique you could use outside of the IDE however would be to run nm -C
on the final executable and grep the template name:
nm -C main.out | grep AnimalTemplate
which directly points to the fact that Dog
was one of the instantiations:
0000000000004dac W AnimalTemplate<Dog>::noise[abi:cxx11]()
0000000000004d82 W AnimalTemplate<Dog>::AnimalTemplate(Dog)
0000000000004d82 W AnimalTemplate<Dog>::AnimalTemplate(Dog)