What techniques can be used to speed up C++ compilation times?

Asked 16/12, 2008 at 23:25 Answered 9/1 at 11:11

302

This question came up in some comments to Stack Overflow question C++ programming style, and I'm interested to hear what ideas there are.

I've seen a related question, Why does C++ compilation take so long?, but that doesn't provide many solutions.

Sky answered 16/12, 2008 at 23:25 Comment(7)

Could you give us some context? Or are you looking for very general answers? – Deathful 16/12, 2008 at 23:27

Very similar to this question: stackoverflow.com/questions/364240/… – Rostand 16/12, 2008 at 23:29

General answers. I've got a really big code base written by many people. Ideas on how to attack that would be good. And also, suggestions for keeping compiles quick for newly written code would be interesting. – Sky 16/12, 2008 at 23:30

Notice that often a relevant part of the build time is not used by the compiler but by the build scripts – Hankypanky 13/2, 2015 at 7:50

I skimmed this page and didn't see any mentions of measurements. I wrote a little shell script that adds a timestamp to each line of input it receives, so I can just pipe in the 'make' invocation. This lets me see which targets are the most expensive, the total compile or link time, etc. just by comparing timestamps. If you try this approach, remember that the timestamps will be inaccurate for parallel builds. – Praise 18/7, 2016 at 13:33

use distcc or incredibuild – Prisage 26/9, 2020 at 23:10

See also: c++ - Decrease clang compile time with precompiled headers - Stack Overflow – Disloyal 17/2, 2023 at 3:51

282

Language techniques

Pimpl Idiom

Take a look at the Pimpl idiom here, and here, also known as an opaque pointer or handle classes. Not only does it speed up compilation, it also increases exception safety when combined with a non-throwing swap function. The Pimpl idiom lets you reduce the dependencies between headers and reduces the amount of recompilation that needs to be done.

Forward Declarations

Wherever possible, use forward declarations. If the compiler only needs to know that SomeIdentifier is a struct or a pointer or whatever, don't include the entire definition, forcing the compiler to do more work than it needs to. This can have a cascading effect, making this way slower than they need to be.

The I/O streams are particularly known for slowing down builds. If you need them in a header file, try #including <iosfwd> instead of <iostream> and #include the <iostream> header in the implementation file only. The <iosfwd> header holds forward declarations only. Unfortunately the other standard headers don't have a respective declarations header.

Prefer pass-by-reference to pass-by-value in function signatures. This will eliminate the need to #include the respective type definitions in the header file and you will only need to forward-declare the type. Of course, prefer const references to non-const references to avoid obscure bugs, but this is an issue for another question.

Guard Conditions

Use guard conditions to keep header files from being included more than once in a single translation unit.

#pragma once
#ifndef filename_h
#define filename_h

// Header declarations / definitions

#endif

By using both the pragma and the ifndef, you get the portability of the plain macro solution, as well as the compilation speed optimization that some compilers can do in the presence of the pragma once directive.

Reduce interdependency

The more modular and less interdependent your code design is in general, the less often you will have to recompile everything. You can also end up reducing the amount of work the compiler has to do on any individual block at the same time, by virtue of the fact that it has less to keep track of.

Compiler options

Precompiled Headers

These are used to compile a common section of included headers once for many translation units. The compiler compiles it once, and saves its internal state. That state can then be loaded quickly to get a head start in compiling another file with that same set of headers.

Be careful that you only include rarely changed stuff in the precompiled headers, or you could end up doing full rebuilds more often than necessary. This is a good place for STL headers and other library include files.

ccache is another utility that takes advantage of caching techniques to speed things up.

Use Parallelism

Many compilers / IDEs support using multiple cores/CPUs to do compilation simultaneously. In GNU Make (usually used with GCC), use the -j [N] option. In Visual Studio, there's an option under preferences to allow it to build multiple projects in parallel. You can also use the /MP option for file-level paralellism, instead of just project-level paralellism.

Other parallel utilities:

Use a Lower Optimization Level

The more the compiler tries to optimize, the harder it has to work.

Shared Libraries

Moving your less frequently modified code into libraries can reduce compile time. By using shared libraries (.so or .dll), you can reduce linking time as well.

Get a Faster Computer

More RAM, faster hard drives (including SSDs), and more CPUs/cores will all make a difference in compilation speed.

Monotone answered 16/12, 2008 at 23:25 Comment(19)

Precompiled headers aren't perfect though. A side effect of using them is that you get more files included than necessary (because every compilation unit uses the same precompiled header), which may force full recompiles more often than necessary. Just something to keep in mind. – Sapheaded 17/12, 2008 at 18:44

Also, in VS2008, you can build multiple .cpp files in parallel, not just projects. – Sapheaded 17/12, 2008 at 18:45

I tend to put things like STL headers and library headers (like windows.h) that tend not to change into the precompiled headers. But yes, it's a bad idea to put in anything that will change even semi-frequently. – Monotone 17/12, 2008 at 19:4

isn't it a better idea to prefer #ifndef to #pragma? for the price of 2 more loc you gain extra compilers support. – Audette 30/12, 2008 at 4:40

How about "be a better programmer" so you don't need to recompile on each new mod3 or mod5 loc? :) – Audette 30/12, 2008 at 4:47

In modern compilers, #ifndef is just as fast as #pragma once (as long as the include guard is at the top of the file). So there's no benefit to #pragma once in terms of compilation-speed – Sapheaded 30/12, 2008 at 15:27

If by modern compilers, you only include gcc, then yes. As far as I can tell, visual c++ doesn't do this. – Monotone 30/12, 2008 at 16:55

Even if you have just VS 2005, not 2008, you can add /MP switch in compilation options to enable parallel building at the .cpp level. – Hagi 23/4, 2009 at 18:9

SSD's were prohibitively expensive when this answer was written, but today they are the best choice when compiling C++. You access a lot of small files when compiling;. That requires a lot of IOPS, which SSD's deliver. – Iggie 4/8, 2010 at 7:59

Prefer pass-by-reference to pass-by-value in function signatures. This will eliminate the need to #include the respective type definitions in the header file This is wrong, you do not need to have the full type to declare a function that passes by value, you only need the full type to implement or use that function, but in most cases (unless you are only forwarding calls) you will need that definition anyway. – Phuongphycology 21/8, 2011 at 10:51

On the combination of #pragma and guards, you should reverse the order. Some compilers will process an include guard as a #pragma once if it is the first thing in the preprocessed file, but will fail to do so if you add anything other than a comment outside of the #ifndef – Phuongphycology 21/8, 2011 at 10:53

+1 for ccache. It took all of 15 minutes to install and I'm now getting 10x speedups in many common scenarios. – Irradiate 27/11, 2012 at 19:23

Does the RAM amount make a difference in compilation speed only because of the pagefile or is there another reason ? I'm just wondering if going from 6gb to 12gb or more will make a difference.... – Nole 9/12, 2012 at 0:17

@Dinaiz: Having less RAM than the compiler uses is very bad because it causes paging, but having significantly more RAM than the compiler uses is also beneficial because the excess is used for disk cache (on most modern OSes) – Si 29/4, 2014 at 18:27

Most of these suggestions assume separate compilation, which might not be achievable when templates are used and the possible template arguments are unknown or there are too many of them. Any suggestions for when there is just one translation unit? – Deville 2/5, 2016 at 13:12

Many good points! To the list of parallel utilities I would add icecc, which is much better than distcc (IMO). I wrote a slightly more Linux-centric article on the subject: Faster C++ builds. Another thing is that release builds (w/o debug symbols) can actually be faster for some projects, since linking can be faster. – Izabel 2/1, 2017 at 0:23

Do yourself a favor and stick with NVME drives on dev machines. Sure they are expensive but what else can beat 3200 reads for the price and ease of use. I’m rockin a 6 core cpu and a 970 pro nvme drive. Speed is wonderful. – Bawdyhouse 12/7, 2018 at 4:14

Exclude the build directories from the virus scanner. – Boutis 23/6, 2021 at 8:30

If the SSD has SDRAM it makes a huge difference. – Arlindaarline 5/1, 2023 at 17:23

I work on the STAPL project which is a heavily-templated C++ library. Once in a while, we have to revisit all the techniques to reduce compilation time. In here, I have summarized the techniques we use. Some of these techniques are already listed above:

Finding the most time-consuming sections

Although there is no proven correlation between the symbol lengths and compilation time, we have observed that smaller average symbol sizes can improve compilation time on all compilers. So your first goals it to find the largest symbols in your code.

Method 1 - Sort symbols based on size

You can use the nm command to list the symbols based on their sizes:

nm --print-size --size-sort --radix=d YOUR_BINARY

In this command the --radix=d lets you see the sizes in decimal numbers (default is hex). Now by looking at the largest symbol, identify if you can break the corresponding class and try to redesign it by factoring the non-templated parts in a base class, or by splitting the class into multiple classes.

Method 2 - Sort symbols based on length

You can run the regular nm command and pipe it to your favorite script (AWK, Python, etc.) to sort the symbols based on their length. Based on our experience, this method identifies the largest trouble making candidates better than method 1.

Method 3 - Use Templight

"Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantiation process".

You can install Templight by checking out LLVM and Clang (instructions) and applying the Templight patch on it. The default setting for LLVM and Clang is on debug and assertions, and these can impact your compilation time significantly. It does seem like Templight needs both, so you have to use the default settings. The process of installing LLVM and Clang should take about an hour or so.

After applying the patch you can use templight++ located in the build folder you specified upon installation to compile your code.

Make sure that templight++ is in your PATH. Now to compile add the following switches to your CXXFLAGS in your Makefile or to your command line options:

CXXFLAGS+=-Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system

templight++ -Xtemplight -profiler -Xtemplight -memory -Xtemplight -ignore-system

After compilation is done, you will have a .trace.memory.pbf and .trace.pbf generated in the same folder. To visualize these traces, you can use the Templight Tools that can convert these to other formats. Follow these instructions to install templight-convert. We usually use the callgrind output. You can also use the GraphViz output if your project is small:

$ templight-convert --format callgrind YOUR_BINARY --output YOUR_BINARY.trace

$ templight-convert --format graphviz YOUR_BINARY --output YOUR_BINARY.dot

The callgrind file generated can be opened using kcachegrind in which you can trace the most time/memory consuming instantiation.

Reducing the number of template instantiations

Although there are no exact solution for reducing the number of template instantiations, there are a few guidelines that can help:

Refactor classes with more than one template arguments

For example, if you have a class,

template <typename T, typename U>
struct foo { };

and both of T and U can have 10 different options, you have increased the possible template instantiations of this class to 100. One way to resolve this is to abstract the common part of the code to a different class. The other method is to use inheritance inversion (reversing the class hierarchy), but make sure that your design goals are not compromised before using this technique.

Refactor non-templated code to individual translation units

Using this technique, you can compile the common section once and link it with your other TUs (translation units) later on.

Use extern template instantiations (since C++11)

If you know all the possible instantiations of a class you can use this technique to compile all cases in a different translation unit.

For example, in:

enum class PossibleChoices = {Option1, Option2, Option3}

template <PossibleChoices pc>
struct foo { };

We know that this class can have three possible instantiations:

template class foo<PossibleChoices::Option1>;
template class foo<PossibleChoices::Option2>;
template class foo<PossibleChoices::Option3>;

Put the above in a translation unit and use the extern keyword in your header file, below the class definition:

extern template class foo<PossibleChoices::Option1>;
extern template class foo<PossibleChoices::Option2>;
extern template class foo<PossibleChoices::Option3>;

This technique can save you time if you are compiling different tests with a common set of instantiations.

NOTE : MPICH2 ignores the explicit instantiation at this point and always compiles the instantiated classes in all compilation units.

Use unity builds

The whole idea behind unity builds is to include all the .cc files that you use in one file and compile that file only once. Using this method, you can avoid reinstantiating common sections of different files and if your project includes a lot of common files, you probably would save on disk accesses as well.

As an example, let's assume you have three files foo1.cc, foo2.cc, foo3.cc and they all include tuple from STL. You can create a foo-all.cc that looks like:

#include "foo1.cc"
#include "foo2.cc"
#include "foo3.cc"

You compile this file only once and potentially reduce the common instantiations among the three files. It is hard to generally predict if the improvement can be significant or not. But one evident fact is that you would lose parallelism in your builds (you can no longer compile the three files at the same time).

Further, if any of these files happen to take a lot of memory, you might actually run out of memory before the compilation is over. On some compilers, such as GCC, this might ICE (Internal Compiler Error) your compiler for lack of memory. So don't use this technique unless you know all the pros and cons.

Precompiled headers

Precompiled headers (PCHs) can save you a lot of time in compilation by compiling your header files to an intermediate representation recognizable by a compiler. To generate precompiled header files, you only need to compile your header file with your regular compilation command. For example, on GCC:

$ g++ YOUR_HEADER.hpp

This will generate a YOUR_HEADER.hpp.gch file (.gch is the extension for PCH files in GCC) in the same folder. This means that if you include YOUR_HEADER.hpp in some other file, the compiler will use your YOUR_HEADER.hpp.gch instead of YOUR_HEADER.hpp in the same folder before.

There are two issues with this technique:

You have to make sure that the header files being precompiled is stable and is not going to change (you can always change your makefile)
You can only include one PCH per compilation unit (on most of compilers). This means that if you have more than one header file to be precompiled, you have to include them in one file (e.g., all-my-headers.hpp). But that means that you have to include the new file in all places. Fortunately, GCC has a solution for this problem. Use -include and give it the new header file. You can comma separate different files using this technique.

For example:

g++ foo.cc -include all-my-headers.hpp

Use unnamed or anonymous namespaces

Unnamed namespaces (a.k.a. anonymous namespaces) can reduce the generated binary sizes significantly. Unnamed namespaces use internal linkage, meaning that the symbols generated in those namespaces will not be visible to other TU (translation or compilation units). Compilers usually generate unique names for unnamed namespaces. This means that if you have a file foo.hpp:

namespace {

template <typename T>
struct foo { };
} // Anonymous namespace
using A = foo<int>;

And you happen to include this file in two TUs (two .cc files and compile them separately). The two foo template instances will not be the same. This violates the One Definition Rule (ODR). For the same reason, using unnamed namespaces is discouraged in the header files. Feel free to use them in your .cc files to avoid symbols showing up in your binary files. In some cases, changing all the internal details for a .cc file showed a 10% reduction in the generated binary sizes.

Changing visibility options

In newer compilers you can select your symbols to be either visible or invisible in the Dynamic Shared Objects (DSOs). Ideally, changing the visibility can improve compiler performance, link time optimizations (LTOs), and generated binary sizes. If you look at the STL header files in GCC you can see that it is widely used. To enable visibility choices, you need to change your code per function, per class, per variable and more importantly per compiler.

With the help of visibility you can hide the symbols that you consider them private from the generated shared objects. On GCC you can control the visibility of symbols by passing default or hidden to the -visibility option of your compiler. This is in some sense similar to the unnamed namespace but in a more elaborate and intrusive way.

If you would like to specify the visibilities per case, you have to add the following attributes to your functions, variables, and classes:

__attribute__((visibility("default"))) void  foo1() { }
__attribute__((visibility("hidden")))  void  foo2() { }
__attribute__((visibility("hidden")))  class foo3   { };
void foo4() { }

The default visibility in GCC is default (public), meaning that if you compile the above as a shared library (-shared) method, foo2 and class foo3 will not be visible in other TUs (foo1 and foo4 will be visible). If you compile with -visibility=hidden then only foo1 will be visible. Even foo4 would be hidden.

You can read more about visibility on GCC wiki.

Froehlich answered 12/11, 2015 at 23:51 Comment(0)

I'd recommend these articles from "Games from Within, Indie Game Design And Programming":

Granted, they are pretty old - you'll have to re-test everything with the latest versions (or versions available to you), to get realistic results. Either way, it is a good source for ideas.

Paten answered 16/12, 2008 at 23:58 Comment(0)

One technique which worked quite well for me in the past: don't compile multiple C++ source files independently, but rather generate one C++ file which includes all the other files, like this:

// myproject_all.cpp
// Automatically generated file - don't edit this by hand!
#include "main.cpp"
#include "mainwindow.cpp"
#include "filterdialog.cpp"
#include "database.cpp"

Of course this means you have to recompile all of the included source code in case any of the sources changes, so the dependency tree gets worse. However, compiling multiple source files as one translation unit is faster (at least in my experiments with MSVC and GCC) and generates smaller binaries. I also suspect that the compiler is given more potential for optimizations (since it can see more code at once).

This technique breaks in various cases; for instance, the compiler will bail out in case two or more source files declare a global function with the same name. I couldn't find this technique described in any of the other answers though, that's why I'm mentioning it here.

For what it's worth, the KDE Project used this exact same technique since 1999 to build optimized binaries (possibly for a release). The switch to the build configure script was called --enable-final. Out of archaeological interest I dug up the posting which announced this feature: http://lists.kde.org/?l=kde-devel&m=92722836009368&w=2

Sharpen answered 21/9, 2010 at 8:38 Comment(4)

I'm not sure if it's really the same thing, but I guess turning on "Whole program optimization" in VC++ (msdn.microsoft.com/en-us/library/0zza0de8%28VS.71%29.aspx) should have the same effect on runtime performance than what you are suggesting. Compile time, however, can definitely be better in your approach! – Sandasandakan 31/10, 2010 at 13:25

@Frerich: You're describing Unity builds mentioned in OJ's answer. I've also seen them called bulk builds and master builds. – Verbid 19/3, 2012 at 22:27

So how does a UB compare to WPO/LTCG? – Profiterole 28/1, 2014 at 0:13

This is potentially useful only for one-time compilations, not during development where you cycle between editing, building and testing. In modern world four cores is the norm, maybe couple of years later core count is significantly more. If the compiler and linker are not able to utilize multiple threads, then the list of files could perhaps be splitted to <core-count> + N sublists that are compiled parallel where N is some suitable integer (depending on system memory and how the machine is otherwise used). – Wakeless 20/4, 2016 at 9:57

I will just link to my other answer: How do YOU reduce compile time, and linking time for Visual C++ projects (native C++)?. Another point I want to add, but which causes often problems is to use precompiled headers. But please, only use them for parts which hardly ever change (like GUI toolkit headers). Otherwise, they will cost you more time than they save you in the end.

Another option is, when you work with GNU make, to turn on -j<N> option:

  -j [N], --jobs[=N]          Allow N jobs at once; infinite jobs with no arg.

I usually have it at 3 since I've got a dual core here. It will then run compilers in parallel for different translation units, provided there are no dependencies between them. Linking cannot be done in parallel, since there is only one linker process linking together all object files.

But the linker itself can be threaded, and this is what the GNU gold ELF linker does. It's optimized threaded C++ code which is said to link ELF object files a magnitude faster than the old ld (and was actually included into binutils).

Clinic answered 16/12, 2008 at 23:31 Comment(2)

Ok, yes. Sorry that question didn't come up when I searched. – Sky 16/12, 2008 at 23:33

you didn't have to be sorry. that was for Visual C++. your question seem to be for any compiler. so that's fine :) – Clinic 16/12, 2008 at 23:37

There's an entire book on this topic, which is titled Large-Scale C++ Software Design (written by John Lakos).

The book pre-dates templates, so to the contents of that book add "using templates, too, can make the compiler slower".

Guttersnipe answered 30/12, 2008 at 3:26 Comment(2)

The book is often referred to in this kinds of topics but for me it was sparse in information. It basically states to use forward declarations as much as possible and decouple dependencies. That's a bit stating the obvious besides that using the pimpl idiom has runtime drawbacks. – Downwash 28/8, 2019 at 8:51

@Downwash I think its point is to use coding idioms which permit incremental re-compilation, i.e, so that if you change a little bit of source somewhere you won't have to recompile everything. – Guttersnipe 28/8, 2019 at 8:55

Once you have applied all the code tricks above (forward declarations, reducing header inclusion to the minimum in public headers, pushing most details inside the implementation file with Pimpl...) and nothing else can be gained language-wise, consider your build system. If you use Linux, consider using distcc (distributed compiler) and ccache (cache compiler).

The first one, distcc, executes the preprocessor step locally and then sends the output to the first available compiler in the network. It requires the same compiler and library versions in all the configured nodes in the network.

The latter, ccache, is a compiler cache. It again executes the preprocessor and then check with an internal database (held in a local directory) if that preprocessor file has already been compiled with the same compiler parameters. If it does, it just pops up the binary and output from the first run of the compiler.

Both can be used at the same time, so that if ccache does not have a local copy it can send it trough the net to another node with distcc, or else it can just inject the solution without further processing.

Byrom answered 17/12, 2008 at 7:10 Comment(1)

I don't think that distcc requires the same library versions on all configured nodes. distcc only does the compilation remotely, not the linking. It also sends the preprocessed code over the wire, so the headers available on the remote system don't matter. – Sharpen 21/11, 2012 at 9:32

Here are some:

Use all processor cores by starting a multiple-compile job (make -j2 is a good example).
Turn off or lower optimizations (for example, GCC is much faster with -O1 than -O2 or -O3).
Use precompiled headers.

Huxham answered 16/12, 2008 at 23:33 Comment(3)

FYI, I find it's usually faster to start more processes than cores. For example on a quad core system, I typically use -j8, not -j4. The reason for this is that when one process is blocked on I/O, the other can be compiling. – Savannasavannah 17/12, 2008 at 0:0

@MrFooz: I tested this a few years ago by compiling the Linux kernel (from RAM storage) on an i7-2700k (4 cores, 8 threads, I set a constant multiplier). I forget the exact best result, but -j12 to around -j18 were considerably faster than -j8, just as you suggest. I'm wondering how many cores you can have before memory bandwidth becomes the limiting factor... – Misadvise 6/9, 2015 at 13:51

@MarkKCowan it depends on a lot of factors. Different computers have wildly different memory bandwidths. With high end processors these days, it takes multiple cores to saturate the memory bus. Also, there's the balance between I/O and CPU. Some code is very easy to compile, other code can be slow (e.g. with lots of templates). My current rule of thumb is -j with 2x the number of actual cores. – Savannasavannah 6/9, 2015 at 17:35

When I came out of college, the first real production-worthy C++ code I saw had these arcane #ifndef ... #endif directives in between them where the headers were defined. I asked the guy who was writing the code about these overarching things in a very naive fashion and was introduced to world of large-scale programming.

Coming back to the point, using directives to prevent duplicate header definitions was the first thing I learned when it came to reducing compiling times.

Inocenciainoculable answered 16/12, 2008 at 23:36 Comment(2)

old but gold. sometimes the obvious is forgotten. – Histone 3/3, 2014 at 15:21

'include guards' – Downwash 28/8, 2019 at 8:36

More RAM.

Someone talked about RAM drives in another answer. I did this with a 80286 and Turbo C++ (shows age) and the results were phenomenal. As was the loss of data when the machine crashed.

Upstretched answered 27/12, 2008 at 14:2 Comment(2)

in DOS you can't have much memory though – Seep 22/12, 2016 at 10:50

and the machine crashed every time you made some memory mistake – Illaffected 29/10, 2020 at 20:5

You could use Unity Builds.

Lynnelynnea answered 17/12, 2008 at 0:21 Comment(2)

@idbrii, link is dead. Here is a snapshot on archive.org – Rocky 11/9, 2014 at 11:45

Be aware that in this case, each project (final binary, static/shared lib) will be built by one single core, if you've a single bigger binary as a bottleneck, drawbacks of unity builds have to be considered (any change to an object of a project requires to rebuild it) – Schwing 19/11, 2021 at 18:17

Use

#pragma once

at the top of header files, so if they're included more than once in a translation unit, the text of the header will only get included and parsed once.

Sky answered 16/12, 2008 at 23:35 Comment(2)

Although widely supported, #pragma once is non-standard. See en.wikipedia.org/wiki/Pragma_once – Taneshatang 27/12, 2008 at 14:33

And these days, regular include guards have the same effect. As long as they're at the top of the file, the compiler is fully capable of treating them as #pragma once – Sapheaded 30/12, 2008 at 15:29

Use forward declarations where you can. If a class declaration only uses a pointer or reference to a type, you can just forward declare it and include the header for the type in the implementation file.

For example:

// T.h
class Class2; // Forward declaration

class T {
public:
    void doSomething(Class2 &c2);
private:
    Class2 *m_Class2Ptr;
};

// T.cpp
#include "Class2.h"
void Class2::doSomething(Class2 &c2) {
    // Whatever you want here
}

Fewer includes means far less work for the preprocessor if you do it enough.

Springing answered 16/12, 2008 at 23:53 Comment(2)

Doesn't this only matter when the same header is included in several translation units? If there is only one translation unit (as is frequently the case when templates are used) then this would seem to have no impact. – Deville 2/5, 2016 at 13:15

If there is only one translation unit, why bother putting it in a header? Wouldn't it make more sense to just put the contents in the source file? Isn't the whole point of headers is that it is likely to be included by more than one source file? – Springing 2/5, 2016 at 14:57

I had an idea about using a RAM drive. It turned out that for my projects it doesn't make that much of a difference after all. But then they are pretty small still. Try it! I'd be interested in hearing how much it helped.

Alleenallegation answered 16/12, 2008 at 23:28 Comment(7)

Huh. Why did someone down-vote this? I'm gonna try it tomorrow. – Sky 16/12, 2008 at 23:41

I expect the downvote is because it never makes a big difference. If you have sufficient unused RAM, the OS will intelligently use it as a disk cache anyway. – Iggie 4/8, 2010 at 8:1

@Iggie - and how much would be "sufficient"? I know that that is the theory, but for some reason using a RAMdrive does actually give a significant boost. Go figure... – Alleenallegation 4/8, 2010 at 10:29

enough to compile your project and still cache the input and temporary files. Obviously the side in GB will depend directly on your project size. It should be noted that on older OS'es (WinXP in particular) file caches were quite lazy, leaving RAM unused. – Iggie 4/8, 2010 at 11:37

Surely ram drive is faster if the files are already in ram rather than do a whole bunch of slow IO first, then they're in ram? (rise-repeat for files that have changed - write them back to disk etc). – Profiterole 3/1, 2013 at 16:21

I would expect the linker to be faster if the libaries are within the ramdrive allready – Czarra 17/10, 2013 at 12:17

The "hard disk" files are probably already in RAM anyway, because of the caching we've had in any decent OS for ages now (decades?) – Zosima 21/8, 2015 at 22:18

Upgrade your computer
1. Get a quad core (or a dual-quad system)
2. Get LOTS of RAM.
3. Use a RAM drive to drastically reduce file I/O delays. (There are companies that make IDE and SATA RAM drives that act like hard drives).
Then you have all your other typical suggestions
1. Use precompiled headers if available.
2. Reduce the amount of coupling between parts of your project. Changing one header file usually shouldn't require recompiling your entire project.

Whitsunday answered 16/12, 2008 at 23:34 Comment(0)

Just for completeness: a build might be slow because the build system is being stupid as well as because the compiler is taking a long time to do its work.

Read Recursive Make Considered Harmful (PDF) for a discussion of this topic in Unix environments.

Request answered 17/12, 2008 at 0:52 Comment(0)

Not about the compilation time, but about the build time:

Use ccache if you have to rebuild the same files when you are working on your buildfiles
Use ninja-build instead of make. I am currently compiling a project with ~100 source files and everything is cached by ccache. make needs 5 minutes, ninja less than 1.

You can generate your ninja files from cmake with -GNinja.

Hankypanky answered 11/2, 2015 at 9:17 Comment(0)

Starting with Visual Studio 2017 you have the capability to have some compiler metrics about what takes time.

Add those parameters to C/C++ -> Command line (Additional Options) in the project properties window: /Bt+ /d2cgsummary /d1reportTime

You can have more informations in this post.

Adytum answered 25/11, 2019 at 16:22 Comment(0)

Where are you spending your time? Are you CPU bound? Memory bound? Disk bound? Can you use more cores? More RAM? Do you need RAID? Do you simply want to improve the efficiency of your current system?

Under gcc/g++, have you looked at ccache? It can be helpful if you are doing make clean; make a lot.

Golf answered 17/12, 2008 at 5:30 Comment(0)

Dynamic linking (.so) can be much much faster than static linking (.a). Especially when you have a slow network drive. This is since you have all of the code in the .a file which needs to be processed and written out. In addition, a much larger executable file needs to be written out to the disk.

Enwreathe answered 17/12, 2008 at 18:24 Comment(1)

dynamic linking prevent many kind of link-time optimizations so the output may be slower in many cases – Seep 20/11, 2019 at 9:8

Networks shares will drastically slow down your build, as the seek latency is high. For something like Boost, it made a huge difference for me, even though our network share drive is pretty fast. Time to compile a toy Boost program went from about 1 minute to 1 second when I switched from a network share to a local SSD.

Contemplative answered 29/4, 2014 at 18:17 Comment(1)

More than seek latency, the reason why network shares are so slow for compiling is that the build machine cannot cache files, since they might be changed by another computer in the network. – Gamache 1/3, 2023 at 19:42

If you have a multicore processor, both Visual Studio (2005 and later) as well as GCC support multi-processor compiles. It is something to enable if you have the hardware, for sure.

Onitaonlooker answered 16/12, 2008 at 23:28 Comment(1)

@Fellman, See some of the other answers -- use the -j# option. – Halo 30/12, 2008 at 4:33

On Linux (and maybe some other *NIXes), you can really speed the compilation by NOT STARING at the output and changing to another TTY.

Here is the experiment: printf slows down my program

Ribband answered 2/12, 2009 at 14:29 Comment(0)

Faster hard disks.

Compilers write many (and possibly huge) files to disk. Work with SSD instead of typical hard disk and compilation times are much lower.

Rachitis answered 16/12, 2013 at 15:33 Comment(0)

First of all, we have to understand what so different about C++ that sets it apart from other languages.

Some people say it's that C++ has many too features. But hey, there are languages that have a lot more features and they are nowhere near that slow.

Some people say it's the size of a file that matters. Nope, source lines of code don't correlate with compile times.

But wait, how can it be? More lines of code should mean longer compile times, what's the sorcery?

The trick is that a lot of lines of code is hidden in preprocessor directives. Yes. Just one #include can ruin your module's compilation performance.

You see, C++ doesn't have a module system. All *.cpp files are compiled from scratch. So having 1000 *.cpp files means compiling your project a thousand times. You have more than that? Too bad.

That's why C++ developers hesitate to split classes into multiple files. All those headers are tedious to maintain.

So what can we do other than using precompiled headers, merging all the cpp files into one, and keeping the number of headers minimal?

C++20 brings us preliminary support of modules! Eventually, you'll be able to forget about #include and the horrible compile performance that header files bring with them. Touched one file? Recompile only that file! Need to compile a fresh checkout? Compile in seconds rather than minutes and hours.

The C++ community should move to C++20 as soon as possible. C++ compiler developers should put more focus on this, C++ developers should start testing preliminary support in various compilers and use those compilers that support modules. This is the most important moment in C++ history!

Photoactinic answered 17/8, 2020 at 18:54 Comment(0)

Although not a "technique", I couldn't figure out how Win32 projects with many source files compiled faster than my "Hello World" empty project. Thus, I hope this helps someone like it did me.

In Visual Studio, one option to increase compile times is Incremental Linking (/INCREMENTAL). It's incompatible with Link-time Code Generation (/LTCG) so remember to disable incremental linking when doing release builds.

Sprang answered 6/9, 2013 at 17:50 Comment(1)

disabling Link-time Code Generation is not a good suggestion as that disables many optimizations. You need to enable /INCREMENTAL in debug mode only – Seep 20/11, 2019 at 9:6

Using dynamic linking instead of static one make you compiler faster that can feel.

If you use t Cmake, active the property:

set(BUILD_SHARED_LIBS ON)

Build Release, using static linking can get more optimize.

Sian answered 27/4, 2020 at 1:2 Comment(0)

I share my experience what additional (sometimes extreme) methods I've tried to reduce compile time even more.
Please note, that updating your compiler, getting a better PC, using parallel compilation, reducing includes gave the most to me as the other answers already mention. Nothing can beat incremental build, but speeding up a clean build is another challenge.

Fixing warnings
The goal was not to speed up compile time, but the end result came with faster build too. Let's say you have enabled a lot of warnings (msvc: /W4 or gcc: -Wall) and you have thousands of warnings. Most of them are easy to fix, especially if they are in header files - fix once and eliminate in each TU.
Why it can improve compilation? Because the compiler needs to report that warning to you. If you use CLI, it needs to print on the console, using an IDE needs to visualize for you.
For our project from ~25 minutes compilation time it was reduced to ~23. Small gain, but recommended.

Not recommended alternative way: You can also switch off all the warnings, probably with the same result.

Running on even more thread
If you have 8 cores it doesn't mean your computer is doing 8 processes. There are way more programs running in the background (depending you usage of course). Because it is constantly swapping between tasks if you launch more threads than cores the compiler gets more CPU time. (Imagine the original cycle is browser-compiler-email, if you launch 3 more compiler threads it might be browser-compiler#1-compiler#2-email-comiler#3-compiler#4 on repeat).
It highly depends on your usage and your codebase. If you have less source files to compile than cores, it adds no benefit. But if you have 1000+ source files that can be compiled parallel and using your computer for less relevant task (or sharing a remote with other projects but you need more resources) you might give it a try.
Blog post for MsBuild, and relevant CMake solution

Messing with thread affinity
Note! As the task manager says:

Changing the priority of certain processes could cause system instability.

Use it on your on risk and never change the process priority on a build agent.

You can either set the priority to the max or to the lowest. Initially I launched the build process with the highest priority, causing my system to lag for hopefully some saved seconds. During the build the computer was unusable, so I cannot recommend this anymore.
So instead, I made a 'background' build script - that launched the build with 'lower than normal' priority. Because usually the developers don't have to focus on multiple things, I cloned the project twice on different branches, and always launched a background build on one of them while working on the other. So when I was able to switch context, a fresh build was waiting for me. Combined with incremental build was a huge benefit for me (saving myself from going insane of the wait).

This is the batch script:
build_low.bat

start /belownormal /d "C:\my_workspace" build.bat

Where build.bat can build my application. See the documentation of starting a program here.

Different build types
As others already said, different optimization levels have huge impact on the build time.
Rule of thumb: by default Debug build configuration is faster to build than a Release build config.
That means most of the time you would choose for the Debug mode to save time and delay the release build as late as you can. (So first make sure unit tests are in place, the feature works correctly, and not introducing new bugs. Then you switch to release mode.)

On the contrary, if you can live without debug symbols in release mode (never planed to collect crash dump from users and analyze it) disable the debug info generation to speed up the build. From MSVC debug information flags:

By default, if no debug information format option is specified, the compiler produces no debugging information, so compilation is faster.

Anion answered 9/1 at 11:11 Comment(0)

From Microsoft: https://devblogs.microsoft.com/cppblog/recommendations-to-speed-c-builds-in-visual-studio/

Specific recommendations include:

DO USE PCH for projects

DO include commonly used system, runtime and third party headers in PCH

DO include rarely changing project specific headers in PCH

DO NOT include headers that change frequently

DO audit PCH regularly to keep it up to date with product churn

DO USE /MP

DO Remove /Gm in favor of /MP

DO resolve conflict with #import and use /MP

DO USE linker switch /incremental

DO USE linker switch /debug:fastlink

DO consider using a third party build accelerator

Otocyst answered 28/9, 2022 at 20:38 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++