Optimizing compile-time performance by caching metafunctions
Asked Answered
N

1

14

Let's say I have the following metafunction:

template <typename T>
struct make_pair {
    using type = std::pair<
        typename std::remove_reference<T>::type,
        typename std::remove_reference<T>::type
    >;
};

Would it improve compilation speed to do this (or something else) instead?

template <typename T>
struct make_pair {
    using without_reference = typename std::remove_reference<T>::type;
    using type = std::pair<without_reference, without_reference>;
};

I see two possibilities:

  1. The compiler has to do some work every time it sees typename std::remove_reference<T>::type. Using an intermediate alias has some kind of "caching" behaviour, which allows the compiler to do some work only once.

  2. Compile-time performance is measured in terms of the number of template instantiations the compiler has to do. Because std::remove_reference<T>::type refers to the same type as std::remove_reference<T>::type, there is only one template instantiation required in both cases, so both implementations are equivalent WRT compile-time performance.

I think B is right, but I would like to be sure. If the answer turns out to be compiler specific, I would mostly be interested in knowing the answer for Clang and GCC.

Edit:

I benchmarked the compilation of a test program to have some data to work with. The test program does something like that:

template <typename ...> struct result;    

template <typename T>
struct with_cache {
    using without_reference = typename std::remove_reference<T>::type;
    using type = result<without_reference, ..., without_reference>;
};

template <typename T>
struct without_cache {
    using type = result<
        typename std::remove_reference<T>::type,
        ...,
        typename std::remove_reference<T>::type
    >;
{ };

using Result = with[out]_cache<int>::type;

These are the average times for 10 compilations of the program, with 10 000 template parameters in result<>.

                -------------------------
                | g++ 4.8 | clang++ 3.2 |
-----------------------------------------
| with cache    | 0.1628s | 0.3036s     |
-----------------------------------------
| without cache | 0.1573s | 0.3785s     |
-----------------------------------------

The test program is generated by a script available here.

Notornis answered 20/6, 2013 at 0:35 Comment(4)
I think no amount of speculation can replace actual measurements. Please post some timing figures then we can create a nice theory to explain them.Metro
I saw a talk on clang that says they make hashtables for template instantiations instead of linked lists. I don't know who they were comparing themselves to though.Backwardation
A template compiler that doesn't do memoization is going to be ridiculously slow.Tankage
@Yakk: On the other hand, I have seen gcc crash on some inputs because it cached so much that it outgrew the available memory, whereas clang was painfully slow but at least compiled the beast (of course, I am talking about ridiculous inputs :p).Coincide
A
2

I can't say this is true of all compilers but GCC, and most likely every other major compiler, is going to use memoization. If you think about it, it almost has to.

Consider the following code

&f<X, Y>::some_value == &f<X, Y>::some_value

This is required to be true, so the compiler has to make sure it doesn't duplicate definitions of methods and static members. Now there might be other ways to do this, but this just screams memoization to me; I don't see another way to implement this even (granted, I have thought about it very hard)

When I use TMP, I expect memoization to occur. It would be a real pain if it didn't, far too slow. The only way I have seen major differences in compile time performance is by either a) using a faster compiler like Clang (which is like 3 times faster than GCC) and choosing different algorithms. Small constant factors seem to me to matter even less in TMP than they do C or C++ in my experience. Choose the right algorithm, try not to do unnecessary work, try to keep number of instantiations down, and use a good compiler (MSVC++ is really slow and far from C++11 compliance but GCC and Clang are quite good); this is all you can do really.

Also, you should always sacrifice compile time for better code. Premature compile time optimization is way more evil than plain premature optimization. There might be an exception to this if for some reason the performance becomes massively prohibitive to development; I havn't ever heard of such a case however.

Anisomerous answered 28/6, 2013 at 17:55 Comment(2)
This answer is not bad, but it's not really what I was looking for. I am looking for something more certain like a confirmation from a GCC or Clang developer. Also, the reason why I was asking this question is because I have been working on metaprogramming libraries and optimizing the compile-times there is crucial.Notornis
@LouisDionne A little late to the party here, but GCC does cache template instantiations. They are stored in the same data structure as explicit specializations. Refer to uses of find_with_hash in pt.c (github.com/gcc-mirror/gcc/blob/master/gcc/cp/pt.c)Explode

© 2022 - 2024 — McMap. All rights reserved.