Why not mark everything inline?

Asked 22/10, 2010 at 18:27 Answered 5/1, 2016 at 2:52

First off, I am not looking for a way to force the compiler to inline the implementation of every function.

To reduce the level of misguided answers make sure you understand what the inline keyword actually means. Here is good description, inline vs static vs extern.

So my question, why not mark every function definition inline? ie Ideally, the only compilation unit would be main.cpp. Or possibly a few more for the functions that cannot be defined in a header file (pimpl idiom, etc).

The theory behind this odd request is it would give the optimizer maximum information to work with. It could inline function implementations of course, but it could also do "cross-module" optimization as there is only one module. Are there other advantages?

Has any one tried this in with a real application? Did the performance increase? decrease?!?

What are the disadvantages of marking all function definitions inline?

Compilation might be slower and will consume much more memory.
Iterative builds are broken, the entire application will need to be rebuilt after every change.
Link times might be astronomical

All of these disadvantage only effect the developer. What are the runtime disadvantages?

Arne answered 22/10, 2010 at 18:27 Comment(7)

KDE compiles their libraries and application that way - comcatenate all files to a single unit and compile that (atleast they used to ) – Eyelid 22/10, 2010 at 18:42

So you are saying write everything in one function called main()? – Hampson 22/10, 2010 at 18:42

I'm suggesting write code as usual but put all the (as many as possible) function definitions in the header files. Also mark them as inline to avoid the multiple definition errors. – Arne 22/10, 2010 at 18:49

Compile time would be astronomical; but the linker has hardly any work left with only one module. – Defoe 23/10, 2010 at 10:38

Just an aside, if you did get down to one TU then you don't actually need to use inline. Proper header include guards would ensure that nothing is multiply-defined. – Ovalle 23/10, 2010 at 11:18

This question is really academic, right? – Jowl 23/10, 2010 at 14:32

@Mike: well, people have been known to do it, under the name of a "unity build". It's risky, though. It assumes that your code won't go wrong because you've expanded the scope of a macro or whatever, so your unity build (a) takes a long time, and (b) might be subtly different from your regular build. Hello, debugging nightmare. – Ovalle 23/10, 2010 at 16:18

Did you really mean #include everything? That would give you only a single module and let the optimizer see the entire program at once.

Actually, Microsoft's Visual C++ does exactly this when you use the /GL (Whole Program Optimization) switch, it doesn't actually compile anything until the linker runs and has access to all code. Other compilers have similar options.

Cherilyncherilynn answered 22/10, 2010 at 18:35 Comment(3)

GCC recently added this to the lto branch. Clang also already has this support from LLVM. – Dowell 22/10, 2010 at 18:39

I guess I'm asking about mimicking /GL optimization by inlining all function definitions. – Arne 22/10, 2010 at 19:0

Well, because a compile switch is superior to reorganizing your code that way. For one thing, the compile switch can be turned off for unoptimized development builds, giving you fast compile times, and turned on for optimized builds. So you get the best of both worlds. – Cherilyncherilynn 22/10, 2010 at 19:21

sqlite uses this idea. During development it uses a traditional source structure. But for actual use there is one huge c file (112k lines). They do this for maximum optimization. Claim about 5-10% performance improvement

http://www.sqlite.org/amalgamation.html

Mite answered 22/10, 2010 at 19:11 Comment(8)

I wonder how they handle source file static variables/functions? If two source files define static int count;, the two variables will merge into one in the amalgamation. – Arne 23/10, 2010 at 1:1

@caspin: Multiple variables with the same name are poor practice anyway, and in C++ there are much better ways (i.e. namespaces) or overcoming these sorts of issues. They probably just forbid variables at global scope in their coding standard. – Cherilyncherilynn 23/10, 2010 at 15:52

@BenVoigt: namespaces don't really solve the problem unless you have a unique namespace for every file. I don't think it's bad practice for two files to have functions named the same thing in the unnamed namespace. The whole point of putting a function in there is that it is internal to that file so that you don't have to worry about collisions (also some optimization benefits). – Selfoperating 5/10, 2013 at 21:20

@David: Two functions named the same and doing different things? That's a bad idea anywhere. – Cherilyncherilynn 5/10, 2013 at 23:18

@BenVoigt: Unless they make sense in the context of the file. The function pass_bar has a very different meaning in a file about lawyers than it does in a file about building a prison. Perhaps your lawyer needs to object and your prison-builder is refactored to pass a generic object. A class called bear might represent an animal or a market. I could see a function like "write_file" being defined multiple times, each with a meaning obvious in its own context. I wouldn't consider any of those necessarily bad design. – Selfoperating 6/10, 2013 at 1:35

well, coding standards for a project using these techniques would use named namespaces instead of anonymous ones – Cherilyncherilynn 6/10, 2013 at 2:21

@BenVoigt: I think equal-named functions doing different things are the biggest reason for namespaces and overload functions. One sort-function may sort random-accessible containers, the other sort sorts binary trees, and the biologists sort may determine the dog's exact kind. – Inequity 12/5, 2014 at 10:20

@phresnel: Still, you had better not be using anonymous namespaces for functions whose names are file-dependent. There are a lot more reasons to avoid name collisions than just Unity builds, for example call stacks during debugging (and exception logging). – Cherilyncherilynn 12/5, 2014 at 14:33

We (and some other game companies) did try it via making one uber-.CPP that #includeed all others; it's a known technique. In our case, it didn't seem to affect runtime much, but the compile-time disadvantages you mention turned out to be utterly crippling. With a half an hour compile after every single change, it becomes impossible to iterate effectively. (And this is with the app divvied up into over a dozen different libraries.)

We tried making a different configuration such that we would have multiple .objs while debugging and then have the uber-CPP only in release-opt builds, but then ran into the problem of the compiler simply running out of memory. For a sufficiently large app, the tools simply are not up to compiling a multimillion line cpp file.

We tried LTCG as well, and that provided a small but nice runtime boost, in the rare cases where it didn't simply crash during the link phase.

Artimas answered 22/10, 2010 at 22:43 Comment(0)

Interesting question! You are certainly right that all of the listed disadvantages are specific to the developer. I would suggest, however, that a disadvantaged developer is far less likely to produce a quality product. There may be no runtime disadvantages, but imagine how reluctant a developer will be to make small changes if each compile takes hours (or even days) to complete.

I would look at this from a "premature optimization" angle: modular code in multiple files makes life easier for the programmer, so there is an obvious benefit to doing things this way. Only if a specific application turns out to run too slow, and it can be shown that inlining everything makes a measured improvement, would I even consider inconveniencing the developers. Even then, it would be after a majority of the development has been done (so that it can be measured) and would probably only be done for production builds.

Randallrandan answered 22/10, 2010 at 18:36 Comment(0)

This is semi-related, but note that Visual C++ does have the ability to do cross-module optimization, including inline across modules. See http://msdn.microsoft.com/en-us/library/0zza0de8%28VS.80%29.aspx for info.

To add an answer to your original question, I don't think there would be a downside at run time, assuming the optimizer was smart enough (hence why it was added as an optimization option in Visual Studio). Just use a compiler smart enough to do it automatically, without creating all the problems you mention. :)

Brawn answered 22/10, 2010 at 18:33 Comment(0)

That's pretty much the philosophy behind Whole Program Optimization and Link Time Code Generation (LTCG) : optimization opportunities are best with global knowledge.

From a practical point of view it's sort of a pain because now every single change you make will require a recompilation of your entire source tree. Generally speaking you need an optimized build less frequently than you need to make arbitrary changes.

I tried this in the Metrowerks era (it's pretty easy to setup with a "Unity" style build) and the compilation never finished. I mention it only to point out that it's a workflow setup that's likely to tax the toolchain in ways they weren't anticipating.

Concubine answered 22/10, 2010 at 18:40 Comment(3)

"the compilation never finished" - is it still running, under a desk somewhere? ;-) – Ovalle 22/10, 2010 at 20:27

I stopped it after a long weekend - maybe I just didn't give it enough time. We eventually switched to SN's compiler. – Concubine 23/10, 2010 at 0:23

I've seen some benchmarks indicating that LTCG in MSVC is far slower (say, a factor 3 or 4) than doing the exact same thing through a Unity build. Not sure if there's any particularly good reasons for this, or if it's just that their linker is less optimized than their compiler. – Tum 24/10, 2010 at 0:13

Little benefit On a good compiler for a modern platform, inline will affect only a very few functions. It is just a hint to the compiler, modern compilers are fairly good at making this decision themselves, and the the overhead of a function call has become rather small (often, the main benefit of inlining is not to reduce call overhead, but opening up further optimizations).

Compile time However, since inline also changes semantics, you will have to #include everything into one huge compile unit. This usually increases compile time significantly, which is a killer on large projects.

Code Size
if you move away from current desktop platforms and its high performance compilers, things change a lot. In this case, the increased code size generated by a less clever compiler will be a problem - so much that it makes the code significantly slower. On embedded platforms, code size is usually the first restriction.

Still, some projects can and do profit from "inline everything". It gives you the same effect as link time optimization, at least if your compiler doesn't blindly follow the inline.

Hanlon answered 23/10, 2010 at 23:42 Comment(0)

It is done already in some cases. It is very similar to the idea of unity builds, and the advantages and disadvantages are not fa from what you descibe:

more potential for the compiler to optimize
link time basically goes away (if everything is in a single translation unit, there is nothing to link, really)
compile time goes, well, one way or the other. Incremental builds become impossible, as you mentioned. On the other hand, a complete build is going to be faster than it would be otherwise (as every line of code is compiled exactly once. In a regular build, code in headers ends up being compiled in every translation unit where the header is included)

But in cases where you already have a lot of header-only code (for example if you use a lot of Boost), it might be a very worthwhile optimization, both in terms of build time and executable performance.

As always though, when performance is involved, it depends. It's not a bad idea, but it's not universally applicable either.

As far as buld time goes, you have basically two ways to optimize it:

minimize the number of translation units (so your headers are included in fewer places), or
minimize the amount of code in headers (so that the cost of including a header in multiple translation units decreases)

C code typically takes the second option, pretty much to its extreme: almost nothing apart from forward declarations and macros are kept in headers. C++ often lies around the middle, which is where you get the worst possible total build time (but PCH's and/or incremental builds may shave some time off it again), but going further in the other direction, minimizing the number of translation units can really do wonders for the total build time.

Tum answered 24/10, 2010 at 0:2 Comment(0)

The assumption here is that the compiler cannot optimize across functions. That is a limitation of specific compilers and not a general problem. Using this as a general solution for a specific problem might be bad. The compiler may very well just bloat your program with what could have been reusable functions at the same memory address (getting to use the cache) being compiled elsewhere (and losing performance because of the cache).

Big functions in general cost on optimization, there is a balance between the overhead of local variables and the amount of code in the function. Keeping the number of variables in the function (both passed in, local, and global) to within the number of disposable variables for the platform results in most everything being able to stay in registers and not have to be evicted to ram, also a stack frame is not required (depends on the target) so function calling overhead is noticeably reduced. Hard to do in real world applications all the time, but the alternative a small number of big functions with lots of local variables the code is going to spend a significant amount of time evicting and loading registers with variables to/from ram (depends on the target).

Try llvm it can optimize across the entire program not just function by function. Release 27 had caught up to gcc's optimizer, at least for a test or two, I didnt do exhaustive performance testing. And 28 is out so I assume it is better. Even with a few files the number of tuning knob combinations are too many to mess with. I find it best to not optimize at all until you have the whole program into one file, then perform your optimization, giving the optimizer the whole program to work with, basically what you are trying to do with inlining, but without the baggage.

Sweeping answered 22/10, 2010 at 21:56 Comment(0)

Suppose foo() and bar() both call some helper(). If everything is in one compilation unit, the compiler might choose not to inline helper(), in order to reduce total instruction size. This causes foo() to make a non-inlined function call to helper().

The compiler doesn't know that a nanosecond improvement to the running time of foo() adds $100/day to your bottom line in expectation. It doesn't know that a performance improvement or degradation of anything outside of foo() has no impact on your bottom line.

Only you as the programmer know these things (after careful profiling and analysis of course). The decision not to inline bar() is a way of telling the compiler what you know.

Upanishad answered 5/1, 2016 at 2:52 Comment(0)

The problem with inlining is that you want high performance functions to fit in cache. You might think function call overhead is the big performance hit, but in many architectures a cache miss will blow the couple pushes and pops out of the water. For example, if you have a large (maybe deep) function that needs to be called very rarely from your main high performance path, it could cause your main high performance loop to grow to the point where it doesn't fit in L1 icache. That will slow your code down way, way more than the occasional function call.

Ruralize answered 22/10, 2010 at 18:37 Comment(8)

Marking all function definitions as inline won't make the compiler inline all function implementations. I would hope that the compiler/optimizer would be smart enough to not super bloat the code. Further I would expect gcc -Os (optimize for size) would work just as well in an application where all function definitions are inlined. – Arne 22/10, 2010 at 19:35

Granted, a sufficiently smart optimizer wouldn't have problems. But if we had sufficiently smart optimizers, the inline keyword would have gone the way of register. Some compilers will aggressively inline even without the keyword at higher -O levels. What the OP described seemed to me like one gigantic function with branches instead of calls. That is not a good way to get faster code. – Ruralize 22/10, 2010 at 20:33

"the inline keyword would have gone the way of register" - no, because inline has a second, more important meaning, which is what caspin is talking about. – Ovalle 22/10, 2010 at 21:20

All modern compilers can aggressively move function implementations inline if it improves the code. All modern compilers ignore the inline keyword when deciding whether or not to move a function implementation inline. NOTE: gcc does not inline any functions when compiling with -O0, not even function definitions that have been marked with the inline keyword. – Arne 23/10, 2010 at 0:35

@Steve: "caspin" may have meant to refer to the effect of the inline keyword on linkage, but his question only used "inline" as a verb and "inlining" the process, not "inline" the adjective. I think I'll fix the question before this confusion goes any further. – Cherilyncherilynn 23/10, 2010 at 15:59

@Nathon: the optimization meaning of inline has already pretty much gone the way of register: compilers don't really care much about it as far as optimization goes. – Tum 24/10, 2010 at 0:3

@jalf: Maybe it has for people who shouldn't be using C (confession: I never touch C++) but some of the compilers for some of the systems I use will not inline functions unless the static and inline keywords appear before the function declarations. – Ruralize 25/10, 2010 at 14:44

@Nathon: well, some compilers still respect the register keyword too. ;) But most mainstream compilers don't really care about the inline keyword for optimization purposes. – Tum 25/10, 2010 at 15:9

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

First off, I am not looking for a way to force the compiler to inline the implementation of every function.

Recommended topics

Hot tags