Problem summary
I'm trying to do an in-memory database with transactions. I'm reaching bottlenecks on compiler level, specifically inlining limit. I don't have much knowledge about compilers and I would like to know why some things are done the way they are.
The goal
The absolute priority of my in-memory database is time performance. It must be super fast, this is the goal. I want to have everything in shared memory. Every database access is direct access to memory. To resolve race-conditions and transactions, spinlocks are implemented on memory level.
Expected result
When I have this pseudo code:
var garage = DB.GetGarage(123);
var car = DB.CreateCar();
car.Color = 2;
garage.ConcurrentBag.Add(car);
For implementation of all these (automatically generated) methods GetGarage
, CreateCar
, ConcurrentBad.Add
I have inlining enabled. Why? I found out that it is faster.
But the reason, why it is faster, is probably not the overhead of calling the function. It seems that when it is inlined, compiler can figure out better machine code than when it is not inlined. In other words, these methods have common things which the compiler probably can simplify, but only if they are inlined, otherwise it cannot simplify them.
Now I'm coming to the merit of the problem. So I make everything inline and I expect the compiler will inline everything.
Actual result
Well, that is not how it works. Compiler may not inline and and lets be specific here:
.NET C# - AggressiveInlining - The attribute might cause implementation limits to be encountered that will result in slower generated code.
C99, C++ - inline - C++ and C99, but not its predecessors K&R C and C89, have support for inline functions, though with different semantics. In both cases, inline does not force inlining; the compiler is free to choose not to inline the function at all, or only in some cases.
What I have tried?
I have tried with C# and reached the limit. After that limit, everything was not inlined. I think something similar would happen with GCC:
max-inline-insns-single: Several parameters control the tree inliner used in gcc. This number sets the maximum number of instructions (counted in GCC's internal representation) in a single function that the tree inliner will consider for inlining. This only affects functions declared inline and methods implemented in a class declaration (C++). The default value is 500.
Atleast with GCC I can choose the limit, with .NET I cannot.
Why compilers doesn't inline everything, analyze it and then generate their own optimized functions?
I would really like to have a feedback on this question please. I know that not everything can be inlined (for example recursive calls). Lets skips cases where this is not possible at all.
I also know that inlining is not gauranteed to have better performance. But I think that all these issues mentioned in the link could be negated by compiler generated functions.
I also know that when inlining, several factors are considered:
We will measure the quality of inlining along three axes: the time spent generating code (aka throughput -- TP) abbreviate as TP), the time spent executing the code (aka code quality -- CQ), and the size of the generated code (CS).
I think that the reason why compilers don't do this could be the time spent generating the code. But what if I don't care... okay I don't want to wait a year, but I can wait a day if I get 20% faster code.
What do you think about it? Is there any compiler for any programming language which can do this (via some flags, or something like that)?
EDIT: According to @RaymondChen (see comments) it is similar to 'inline everything and then have another step to de-inline things':
De-inlining (also known as "common subexpression elimination") is something compilers already do.
But according to my research, CSE doesn't involve generating a new function, but rather use of saved data:
Common-subexpression elimination is a transformation that removes the recomuptations of common subexpressions and replaces them with uses of saved data.
I cannot find anything about compiler generated functions other than some related to C++ class constructors, destructors and operators. So, I'm still looking for an answer and hope somebody can provide some sources.
@RaymondChen also mentions:
finding de-inlining opportunities becomes harder the bigger the code being analyzed. The number of things to check grows (naively) as the fourth power of the code size. The time required for a large program will probably exceed your human lifetime, and the compiler itself will run out of memory long before then.
This could be a good to answer my question, but it is also something I'm struggling to accept. If finding de-inlining opportunities for large program by compiler will take more than human lifetime, how is that possible that I as a human can do it by myself in reasonable time, just by looking at the (high level, not machine) code and refactoring.
I understand that some tasks (pattern recognition, language translation, etc.) are really hard for computers to do. But today, we have neural networks. Would it be possible to use neural network for such a thing as finding de-inlining opportunities?
@PeterCordes mentions:
(in real-world compilers which don't try to re-factor straight-line code back into functions or loops)
I'm again asking why? I'm sure the compiler can figure out better functions than me. Why the compiler just take my functions and at most optimize or inlines them, but never (except from C++ constructor, destructor, etc.) generates a new one?
malloc
. And each of those places is in functions that are themselves used in 1000 places. For example,std::vector::vector()
. You now need to inline a million copies of themalloc
function. Also, yourmain
function would probably now have several megabytes of local variables, since it has to hold all the variables from all the potential code paths due to inlining, even if most of them are never executed. – Trolleyvar garage = etc. etc.
with a single functiongarage = DB.AddNewColouredCarToGarage(123, 2);
, and have a couple dozenAdd[New]SomethingToSomethingElse()
Macros can be used to generate code more efficiently, sort of a manual inlining. – Swash