c++ heuristic for estimating function inlining benefits

Asked 18/8, 2011 at 12:46 Answered 18/8, 2011 at 14:17

In c++, what is a good heuristic for estimating the compute time benefits of inlining a function, particularly when the function is called very frequently and accounts for >= 10% of the program's execution time (eg. the evaluation function of a brute force or stochastic optimization process). Even though inlining may be ultimately beyond my control, I am still curious.

Yardarm answered 18/8, 2011 at 12:46 Comment(11)

If it's called recursively, then it cannot be fully inlined (unless the recursion is statically bounded, but that's hard to detect). – Colorado 18/8, 2011 at 12:49

You know you can use __inline__ (GCC) or __forceinline (VC) to have 'ultimate' control over the inlining? – Kelwunn 18/8, 2011 at 12:49

@larsmans Wrong. Read up on tail calls. But even non tail-recursive functions can sometimes be automatically refactored and inlined. – Neglect 18/8, 2011 at 12:51

@Konrad: If tail-recursion optimization is performed, then the recursion is implicitly inlined already, so there is no point in trying to inline the outer call (unless the cost is unrelated to the recursion and is rather a matter of the cost at each recursion level, or the level of recursion is minimal) – Drawl 18/8, 2011 at 12:54

@Constantinius: Dong so is usually counter productive. The compiler is much better than humans at deciding when to inline. Forcing the compiler usually leads to sub-optimal code. – Wrongly 18/8, 2011 at 12:56

Oops, my bad on the semantics. I didn't mean recursively, I just meant looped repeatedly. sorry all. – Yardarm 18/8, 2011 at 12:57

@David True. But how does this contradict what I said? Either way, a call to a tail-recursive function, and all recursive calls in it, can be inlined. – Neglect 18/8, 2011 at 12:57

@Konrad: We have to agree, I was just saying that while larsmans' comment is not precisely correct (there are cases where it can be done), it is a fairly good approximation to reality (when it can be inlined, inlining will most probably not affect performance as it is already inlined) – Drawl 18/8, 2011 at 13:2

@Konrad: which compilers do auto refactoring of recursion into iteration? (I'm just asking; I'm no compiler implementation expert.) – Colorado 18/8, 2011 at 17:27

@larsmans I’m pretty certain that several modern compilers do. GCC for instance recognises, inlines, and calculates at compile time a recursive factorial function when it’s called with a compile-time constant argument. I suspect that this is done via such a refactoring since it would be the easiest way. – Neglect 18/8, 2011 at 18:26

@Konrad: that's not necessarily the easiest way; I imagine replacing n * fac(n-1) for constant n can be done by recursive constant expansion as well. – Colorado 22/8, 2011 at 8:26

There is no general answer. It depends on the hardware, the number and type of its arguments, and what is done in the function. And how often it is called, and where. On a Sparc, for example, arguments (and the return value) are passed in registers, and each function gets 16 new registers: if the function is complex enough, those new registers may avoid spilling that would occur if the function were inlined, and the non-inline version may end up faster than the inlined one. On an Intel, which is register poor, and passes arguments in registers, just the opposite might be true, for the same function in the same program. More generally, inlining may increase program size, reducing locality. Or for very simple functions, it may reduce program size; but that again depends on the architecture. The only possible way to know is to try both, measuring the time. And even then you'll only know for that particular program, on that particular hardware.

Bird answered 18/8, 2011 at 13:19 Comment(3)

You mean "passes arguments on the stack" for Intel (well "Intel" is ambiguous, for Itanium you'd be right [though those aren't register poor], but I assume you mean x86) – Benioff 18/8, 2011 at 13:47

@Benioff Yes to both: 'passes arguments on the stack', and by Intel, I meant x86. – Bird 18/8, 2011 at 14:19

+1, some calling conventions in x64 tend to pass more arguments in registers, but the reasoning is sound: it depends – Drawl 18/8, 2011 at 15:41

A function call and return on some architectures take as few as one instruction each (although they're generally not RISC-like single-cycle instructions.) In general, you can compare that to the number of cycles represented by the body of the function. A simple property access might be only a single instruction, and so putting it into a non-inlined function will triple the number of instructions to execute it -- obviously a great candidate for inlining. On the other hand, a function that formats a string for printing might represent hundreds of instructions, so two more isn't going to make any difference at all.

Contrariety answered 18/8, 2011 at 12:52 Comment(3)

how likely is it that the call will require so few instructions, and is there a way to guess how many a particular function will take? – Yardarm 18/8, 2011 at 13:2

You have to (1) know enough assembly language to have a rough idea how functions will be compiled, and (2) at least roughly understand the C++ object model as implemented by your compiler. Which is a long way of saying no, you can't really guess; you just have to know. – Contrariety 18/8, 2011 at 13:18

Whether any particular inline function is faster isn't the whole story. The number of calls to the function have to be significant enough to warrant inlining. Inlining has its own cost in terms of cluttering up header files, or in having yet a 3rd place in which to find code. Additionally if functions are inlined you cannot simply insert a breakpoint in one place in order stop whenever the function is accessed. – Uniplanar 18/8, 2011 at 14:37

If your bottleneck is in a recursive function, and assuming that the level of recursion is not minimal (i.e. average recursion is not just a few levels), you are better off in working with the algorithm in the function rather than with inlining.

Try, if possible, to transform the recursion into a loop or tail-recursion (that can be implicitly transformed into a loop by the compiler), or try to determine where in the function the cost is being spent. Try to minimize the impact of the internal operations (maybe you are dynamically allocating memory that could have auto storage duration, or maybe you can factor a common operation to be performed external to the function in a wrapper and passed in as an extra argument,...)

*EDIT after the comment that recursion was not intended, but rather iteration*

If the compiler has access to the definition of the function, it will make the right decision for you in most cases. If it does not have access to the definition, just move the code around so that it does see it. Maybe make the function a static function to provide an extra hint that it won't be used anywhere else, or even mark it as inline (knowing that this will not force inlining), but avoid using special attributes that will force inlining, as the compiler probably does it better than any simple heuristic that can be produced without looking at the code.

Ruperto answered 18/8, 2011 at 13:0 Comment(2)

cool. The functions in question are member functions that are called in main, and main has #include "class.h", while the functions are defined in class.cpp. Is it likely that the compiler can access the definition? – Yardarm 18/8, 2011 at 13:9

@Matt Munson: In the general case no. Some linkers can perform link time inlining as part of whole program optimization, and might be able to pull it off, but in general it would not be available. Also note, that depending on what the cost of the calls (which you would avoid by inlining) relative to the actual cost of the execution of the code in the function it might not provide any advantage at all. – Drawl 18/8, 2011 at 15:39

All inlining saves you is the entry/exit cost of the function, so it's only worth considering if the function does almost nothing. Certainly if the function itself contains a function call, it's probably not worth considering.

Even if the function does very little, it has to be called so much that it owns the program counter a significant percent of the time, before any speedup of the function would be noticeable.

Miniver answered 18/8, 2011 at 14:17 Comment(1)

The savings depend on where in the optimization pipeline inlining is performed. If it's done early enough, constant propagation may be done on the resulting code. – Colorado 18/8, 2011 at 17:25

The behaviour here is somewhat compiler dependant. With a recursive function obviously inlining behaviour can in theory be infinite. The 'inline' keyword is only a hint to the compiler, it can choose it ignore if it can't do anything with it. Some compilers will inline the recursive function to a certain depth.

As for the 'how much will this speed things up' - unfortunately we can't provide any sort of answer to that question as 'it depends' - how much work is the function doing vs the overhead of the function call mechanism itself. Why don't you set up a test and see?

Night answered 18/8, 2011 at 12:53 Comment(0)

Our experience, 20+ years of writing computationally intensive C++, is that inlining is no silver bullet. You really do need to profile your code to see whether inlining will increase performance. For us except for low level 2D and 3D point and vector manipulations inlining is a waste of time. You are far better off working out a better algorithm than trying to micromanage clock ticks.

Uniplanar answered 18/8, 2011 at 13:44 Comment(0)

Recommended topics

Hot tags