Do c++ templates make programs slow?

Asked 14/3, 2010 at 13:52 Answered 14/3, 2010 at 14:6

I have heard from many people that usage of templates make the code slow. Is it really true. I'm currently building a library. There are places where if templates are not created, it would result in code management problem. As of now I can think two solutions to this problem:

use #defines
Use templates and define all possible types in the header file/library itself but do not allow end user to make template instances.

e.g. typedef Graph<int> GraphI32; etc.

Is there anyway, to restrict user from creating various template instances on their own.

Help on above queries would be highly regarded.

Columbous answered 14/3, 2010 at 13:52 Comment(2)

How to reduce compile times: softwareengineering.stackexchange.com/questions/177546/… – Jinnyjinrikisha 8/10, 2019 at 11:2

Here's a logical follow-up question to this question about speed: Do C++ templates make programs large?. – Condition 22/4, 2020 at 4:36

The short answer is no. For the longer answer please read on.

As others have already noted, templates don't have a direct run-time penalty -- i.e. all their tricks happen at compile time. Indirectly, however, they can slow things down under a few circumstances. In particular, each instantiation of a template (normally) produces code that's separate and unique from other instantiations. Under a few circumstances, this can lead to slow execution, by simply producing enough object code that it no longer fits in the cache well.

With respect to code size: yes, most compilers can and will fold together the code for identical instantiations -- but that's normally the case only when the instantiations are truly identical. The compiler will not insert code to do even the most trivial conversions to get two minutely different instantiations to match each other. For example, a normal function call can and will convert T * to T const * so calls that use either const or non-const arguments will use the same code (unless you've chosen to overload the function on constness, in which case you've probably done so specifically to provide different behavior for the two cases). With a template, that won't happen -- instantiations over T * and T const * will result in two entirely separate pieces of code being generated. It's possible the compiler (or linker) may be able to merge the two after the fact, but not entirely certain (e.g., I've certainly used compilers that didn't).

But in the end, templates have positive effects on speed far more often than negative.

Fagaly answered 14/3, 2010 at 14:6 Comment(16)

Cache exaustion has nothing to do with templates. If the compiler had to generate a lot of code then you would have had to also generate the code manualy (without templates) the same situation exists. So this is a total red herring. – Wyandotte 14/3, 2010 at 16:18

"In particular, each instantiation of a template (normally) produces code that's separate and unique from other instantiations." - some evidence for this, please. Most implementations create multiple instances and then discard all but one at link-time, or do you think that every use of vector <int>, for example, generates duplicate code in the executable? If that were the case, no-one would ever use templates at all. – Gaylordgaylussac 14/3, 2010 at 16:34

The typical comparison is between C++-style templated containers, against Java-style type erasure. It's possible to implement C++ standard containers such that vector<int> and vector<unsigned int> call common binary code, but your compiler might not do that. Java ensures all code is shared between different kinds of vectors. Of course this is a trade-off, but it reduces the tendency Jerry highlights for C++ vectors over different types to generate separate code. If it all gets inlined either way then I guess it makes little difference whether the code is "the same" or "separate". – Shivery 14/3, 2010 at 16:52

In practice, every modern compiler is able to fold together common code across different template instantiations - and indeed, fold together common generated code in general, even for seemingly unrelated sections of code. It's a very basic optimization. The compiler are linker are smarter than you. – Sebi 14/3, 2010 at 17:14

@Terry: so if I use vector<int> in one T.U and vector<float> in another, do I have your personal guarantee that the code to perform a reallocation will only appear once in the linked executable, and calls to push_back on either kind of vector will call it? Assuming sizeof(int) == sizeof(float) in my implementation, that is. Assuming also that it's not inlined, of course, since if it was then it won't be common no matter what. – Shivery 14/3, 2010 at 17:40

@Steve: The standard allocator has no requirement to know the type at the point where memory management is done: 'pointer allocate(size_type cnt, typename std::allocator<void>::const_pointer = 0)' So assuming a good compiler yes. – Wyandotte 14/3, 2010 at 18:9

Which compilers are good? Bearing in mind that a "reallocate" of a vector doesn't just call the allocator, it also copies the data. Since int and float are both POD, I expect this to be done in common code too for the win. – Shivery 14/3, 2010 at 19:1

@Steve: You don't think their is a template specialization for all POD types that does a memcopy()? – Wyandotte 14/3, 2010 at 19:38

I don't know, what I want to know is whether I could safely assume that there is. Although remember this is just one example, the real issue is whether a typical compiler does, as Terry says, common up code even that's unrelated. My experience has mostly been the reverse, my compiler un-commons code even that's the same function, by inlining. But that may just be because I always use -O3, and that -Os has powers to merge identical-looking function bodies. Does it? – Shivery 14/3, 2010 at 19:59

Just checked, and the answer is no. I defined two identical POD structs, create a vector of each, then loop doing push_back on both vectors 100 times. Compiled with g++ 4.3.2 with -Os, I can see two copies of the push_back code, one each for Foo and Bar (e.g. calls to two different versions of _M_insert_aux). And this is with everything in a single TU. It's possible that "good compilers" exist, but if GCC is not one of them then in practice Jerry's assertion is correct, "each instantiation of a template (normally) produces code that's separate and unique from other instantiations". – Shivery 14/3, 2010 at 20:22

Wow, _M_insert_aux is a pretty big lump of code actually (well two near-identical lumps, one each for Foo and Bar). A lot more than just a wrapper to a call to common code to reallocate, followed by type-specific code to initialize the vector element. It does, however, contain a call to memmove, so it has less duplication than the absolute maximum possible. – Shivery 14/3, 2010 at 20:37

Your test isn't conclusive Steve and a little overly simplistic - the compiler and linker are very smart and these are very complex topics. Just because they have a tool in the toolbox, it doesn't mean they use it every time. Try charting the resulting executable growth as you instantiate more and more different T's for a vector. You'll see it's decidedly non-linear, and in fact at some point it'll almost completely flatten out. (Also; I'm more of a VC guy than gcc - read about LTCG and Comdat folding in the VC compiler/linker if you're really interested) – Sebi 15/3, 2010 at 5:19

@Terry, Steve, Martin: The compiler can't perform the optimization because every instantiated function has to have a unique address so that function pointer comparisons will work. vector<int>::push_back and vector<unsigned int>::push_back will always have distinct addresses, so they never share the same binary code. This is the same reason non-inline versions of inline functions are always generated. A clever whole program optimizing compiler could track if the address is never taken of templated functions and then allow them to merge, but I don't know if any compilers actually do that. – Ordinance 15/3, 2010 at 15:36

@Joseph: Nope, that only applies if someone actually takes the address of those functions. If no one does, then they can be combined. "as if" rule in action. Also, non-inline versions of inline functions are not always generated - don't know where you heard that from. – Sebi 16/3, 2010 at 2:40

@Terry: A dummy program that makes a std::vector<int> and std::vector<unsigned int> and pushes one number onto both of them, then prints them, running "nm" on the resulting executable, even with optimizations, shows symbols for both (only 1 redundant symbol compared to many with debug mode, but I assume that's because of inlining, not the optimization you describe). (Using GCC 4.2 and -O2) – Ordinance 16/3, 2010 at 17:0

@Terry: And about inline functions -- the standard mandates that an inline function have a consistent address across the whole app. The only way for all users of the library within an app to see the same address is if the library includes a non-inline version of the function that's used for address taking. I shouldn't have said strictly always, because you could throw them away in an app, but not in a library. A smart linker may do it for libraries if you're annotating functions to indicate whether they're exported and the inline function is not. – Ordinance 16/3, 2010 at 17:2

Since template instantiation happens at compile time, there is no run-time cost to using templates (as a matter of fact templates are sometimes used to perform certain computations at compile-time to make the program run faster). Heavy use of templates can however lead to long compile times.

Ovovitellin answered 14/3, 2010 at 13:54 Comment(8)

Mostly right, but increased code size due to multiple instantiations of a template function can increase instruction cache misses, and slow down your program. There are cases where a non-template function could really service multiple types (e.g. int, short, and long) and do so with less executable code. – Coriander 14/3, 2010 at 14:8

@John: The compiler only instanciates actual classes that are used. This if non templated code were used the same situation would exist as the dev would have to generate all the classes manually. So this though techincally true is has nothing to do with templates it would be a design issue and occures weather templates are used or not. – Wyandotte 14/3, 2010 at 16:21

@Martin York: for example, I want to sort an array of int, an array of float, and an array of double. I could call three different instantiations of std::sort, or I could call qsort three times. Is it conceivable that the latter would produce smaller code? You can of course argue that programmers might proliferate multiple near-identical functions without the use of templates, but in practice they usually don't, if only because the typing would be horrendous. – Shivery 14/3, 2010 at 17:52

Of course in that example, the smaller non-template code would probably be slower, because of the function pointer. But there seem to be people saying that templates cannot conceivably lead to larger object code. I don't think that corresponds with reality unless you compare "use of templates" with "a lot of copy-and-paste", when IMO you should be comparing "use of templates" with "a plausible non-template-using alternative design". Templates do in fact change how you design your code - this was a deliberate goal when they were invented, so I don't think it's "nothing to do with" them. – Shivery 14/3, 2010 at 17:57

@steve: Here you are making a delibrate trade off. You are delibrately trading speed for size (as qsort is slower because of the required inbedded function call to do the comparison (due to its generic nature)). So in this situation you have actively decided to make the trade. If on the other hand you wanted an optimal way to do the sort without using templates you would need to write three versions of sort, how is this different from using a single templated version? – Wyandotte 14/3, 2010 at 18:11

You said, "if non templated code were used the same situation would exist as the dev would have to generate all the classes manually". My point is that this is a straw man. Typically if non-templated code were used, the dev would not generate all the classes manually, and the same situation would not exist. In fact, the dev would find some other way to write somewhat generic code, and it is this against which any sane analysis is done of whether using templates is a good idea or not. Of course it is a good idea, just not for the reason that if you didn't use it, you'd necessarily c'n'p. – Shivery 14/3, 2010 at 18:31

For comparison: Q. "will I develop my application faster in Java than in assembler?". A. "No. If you wrote it in Java the same way as in asm, you'd just have a massive array of integers representing a memory map, and you would perform addition, multiplication and so on, using indices into this array instead of pointers in your assembler code. This isn't any easier. Of course you might decide to use variables, objects, and GC in Java. That would probably speed up development, but this is a deliberate trade-off and a design decision, and nothing to do with Java" Or A. "Probably"? ;-) – Shivery 14/3, 2010 at 18:38

Similarly, non-template code will (I suspect, although possibly I'm wrong: that's another thread) come out somewhat smaller than template code, with less inlining, if you assign the same task to the same programmer. If it's a task that templates handle well, the non-template code would also presumably come out slower. The mistake (if the questioner's informant is even thinking along these lines) is to think that the larger template code will be slower. It's not a mistake AFAIK to think it will be larger. But we can both come up with plausible examples either way, of course. – Shivery 14/3, 2010 at 18:49

No they don't. When ever you find that you have "heard" something, and cannot name the source, you can almost certainly guarantee that what you have heard is wrong. In fact, templates tend to speed code up.

Rather than depending on hearing things, it's a good idea to read an authoritative book on the subject - I recommend C++ Templates - The Complete Guide.

Gaylordgaylussac answered 14/3, 2010 at 13:55 Comment(0)

Template make the Compilation slow. But most of the time it makes the program faster.

Preconception answered 14/3, 2010 at 13:54 Comment(0)

They do make object code bigger, because C++ generates code for every type you use. But I don't believe this will slow execution speed. I have no numbers to suggest that it would.

It certainly does improve your lot in life during code development, reading and maintenance. I would not let coding urban myths discourage you from using a language feature that's clearly useful.

Costume answered 14/3, 2010 at 14:2 Comment(1)

Enough to eschew using them? I say no. – Costume 14/3, 2010 at 14:43

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags