Optimizing compile-time performance by caching metafunctions

Let's say I have the following metafunction:

template <typename T>
struct make_pair {
    using type = std::pair<
        typename std::remove_reference<T>::type,
        typename std::remove_reference<T>::type
    >;
};

Would it improve compilation speed to do this (or something else) instead?

template <typename T>
struct make_pair {
    using without_reference = typename std::remove_reference<T>::type;
    using type = std::pair<without_reference, without_reference>;
};

I see two possibilities:

The compiler has to do some work every time it sees typename std::remove_reference<T>::type. Using an intermediate alias has some kind of "caching" behaviour, which allows the compiler to do some work only once.
Compile-time performance is measured in terms of the number of template instantiations the compiler has to do. Because std::remove_reference<T>::type refers to the same type as std::remove_reference<T>::type, there is only one template instantiation required in both cases, so both implementations are equivalent WRT compile-time performance.

I think B is right, but I would like to be sure. If the answer turns out to be compiler specific, I would mostly be interested in knowing the answer for Clang and GCC.

Edit:

I benchmarked the compilation of a test program to have some data to work with. The test program does something like that:

template <typename ...> struct result;    

template <typename T>
struct with_cache {
    using without_reference = typename std::remove_reference<T>::type;
    using type = result<without_reference, ..., without_reference>;
};

template <typename T>
struct without_cache {
    using type = result<
        typename std::remove_reference<T>::type,
        ...,
        typename std::remove_reference<T>::type
    >;
{ };

using Result = with[out]_cache<int>::type;

These are the average times for 10 compilations of the program, with 10 000 template parameters in result<>.

                -------------------------
                | g++ 4.8 | clang++ 3.2 |
-----------------------------------------
| with cache    | 0.1628s | 0.3036s     |
-----------------------------------------
| without cache | 0.1573s | 0.3785s     |
-----------------------------------------

The test program is generated by a script available here.

I can't say this is true of all compilers but GCC, and most likely every other major compiler, is going to use memoization. If you think about it, it almost has to.

Consider the following code

&f<X, Y>::some_value == &f<X, Y>::some_value

This is required to be true, so the compiler has to make sure it doesn't duplicate definitions of methods and static members. Now there might be other ways to do this, but this just screams memoization to me; I don't see another way to implement this even (granted, I have thought about it very hard)

When I use TMP, I expect memoization to occur. It would be a real pain if it didn't, far too slow. The only way I have seen major differences in compile time performance is by either a) using a faster compiler like Clang (which is like 3 times faster than GCC) and choosing different algorithms. Small constant factors seem to me to matter even less in TMP than they do C or C++ in my experience. Choose the right algorithm, try not to do unnecessary work, try to keep number of instantiations down, and use a good compiler (MSVC++ is really slow and far from C++11 compliance but GCC and Clang are quite good); this is all you can do really.

Also, you should always sacrifice compile time for better code. Premature compile time optimization is way more evil than plain premature optimization. There might be an exception to this if for some reason the performance becomes massively prohibitive to development; I havn't ever heard of such a case however.

Recommended topics

Hot tags