Are functors actually faster than pointers to functions?

Asked 25/11, 2016 at 17:27 Answered 29/11, 2016 at 20:21

Solved c++function-pointers function-object inlining

According to Scott Meyers, one area where C++ shines over C is that function objects are faster than function pointers. He says this is because function objects are inlined, which increases speed.

I have two questions about this:

How can we verify that function objects are, in fact, inlined? Can we verify this in practice?
Does the inlining of function objects depend on the compiler we use, or do all compilers behave like this?

Isom answered 25/11, 2016 at 17:27 Comment(4)

The lto ( link time optimization ) should be able to inline a function and it's usage in different compilation unit. – Profile 25/11, 2016 at 17:51

Function objects are not (and cannot be) inlined; they aren't code. Their member functions, often the function call operators (operator()), are. – Leucoplast 25/11, 2016 at 18:50

Is this question about functors? If not, can the title be changed? – Spectacular 26/11, 2016 at 17:25

Could you provide a precise reference? (title of Scott Meyers' book, chapter, paragraph) – Saguenay 22/2, 2017 at 14:53

The C++ and C standards leaves a bunch of freedom to compilers. Compilers are free to count to 1 billion between every instruction, or only do so if an integer has a prime value in it.

Decent "real" compilers don't do this. This is a quality of implementation issue.

Inlining function objects into something like std::sort is something that every real compiler does. It is exceedingly easy to detect what needs to be inlined in those cases, because the type information carries with it the code needed to be inlined.

Doing so with a function pointer is harder. Doing so with a function pointer where everything has been converted to void* or char* pointers is even harder.

The effect of this is that, in practice, a C-style call to qsort vs a C++-style call to std::sort can result in an large advantage for std::sort.

qsort is roughly 2x slower than std::sort, as shown here, in a ridiculously simple situation of sorting randomly arranged integers.

Inspecting the actual assembly code output is mainly a detail, and it is a lot of work for little return. Taking concrete real-world examples gives you an idea of how big the impact really is.

All 3 of clang, gcc and MSVC where able to make std::sort be significantly faster than their qsort. And as this is an easy optimization, while optimizing function pointers into inline calls is not, you'd expect less major compilers to be no better than this at qsort.

Reyna answered 25/11, 2016 at 17:36 Comment(8)

It is easier to propagate type information and inline with templates because templates are basically "copy&paste" specialized. It's a code generation mechanism. (The downside is larger binaries and compile times, just as if you had duplicated the code.) – Bootstrap 25/11, 2016 at 20:15

I agree with @usr, the difference is really that using templates, instances can be better adapted to the details of the calling situation than a compiled-once library function can, simply because the latter has to run the same code for a variety of situations. If in C one would implement qsort by a macro rather than by a library function, then the compiler could optimise each instance, in spite of the use of function pointers (as long as it is clear at each call which function is being pointed to), and the result would be as fast as that of std::sort in C++. – Heady 26/11, 2016 at 16:51

@marc I double dog dare you to implement qsort as a C standard complaint (pick any!) #define macro, then beat, match or approach std::sort. – Reyna 26/11, 2016 at 17:12

@Yakk why would it not be possible? (I agree with your answer, to be clear.) – Bootstrap 26/11, 2016 at 23:35

@Bootstrap Everything is possible in the turing tar pit. Macros are not wuite turing complete, but you can get close. The point is doing it may be much harder. Probably the practical approach would be to separately write your custom qsort via macro with hard coded stride and comparison functions, then separately call them. Two-step. That I believe may be what I found in above link. What I originally envisioned was a qsort call that expanded into C that impemented qsort inline (also doable, but much more insane) – Reyna 26/11, 2016 at 23:56

The link and your conclusion doesn't address the question. std::sort is faster because it doesn't use neither function object nor function pointer, but directly compare the elements as pointed out by @usr. – Mourner 29/11, 2016 at 20:6

@Mourner No, std::sort has one (set of) overloads that compare using <, and another (set of) overloads that compare using a passed-in function object. Passing std::less<void>{} (less<void> is a stateless function object that invokes <) as the function object to the second results in identical performance as the first one that just uses < (ignoring some strange corner cases). The direct use of < isn't why it is faster, because using it with an indirect < is still just as fast. Footnote: (set of) is in reference to parallel std::sorts added in C++17. – Reyna 29/11, 2016 at 20:10

@Yakk I agree, but this is exactly what the linked benchmark isn't testing. – Mourner 29/11, 2016 at 20:25

How can we verify that function objects are in fact inlined? Can we verify this in practice?

Sure, inspect the finally emitted assembler code.

The inlining function objects depends on the compiler we use, or all compilers behave like that?

It heavily depends on compiler implementation and optimization level used.
So no, there's no guarantee particular compilers (linkers) behave so.

Calls through function pointers cannot be inlined though.

According to him, function objects are inlined, so there is an increase in speed.

IMO "function objects are inlined" should better read (or heard, I don't know where that cite is from):

function objects can be inlined while calls through function pointers cannot.

Capstan answered 25/11, 2016 at 17:31 Comment(4)

«It heavily depends on compiler implementation ...» So we may conclude that the statement of Scott Meyers is not an absolute truth? Can we have function objects slower than function pointers, and this is a case-by-case issue, right? – Isom 25/11, 2016 at 17:42

@Isom "So we may conclude that the statement of Scott Meyers is not an absolute truth?" Who would you ever trust to tell "the absolute truth"?? It's probably true for most modern c++ compilers, but still depends on optimization levels and other circumstances the compiler decides to inline code or not. – Bifurcate 25/11, 2016 at 17:52

Actually, inlining calls through function pointers is possible, provided that the implementation of said function (e.g. the one being called via the pointer) is available. See here for an example. I remember seeing this in a much larger program and being thoroughly shocked. But the basic gist remains true : this depends on the particular compiler and the optimization options - in this case, running with -O0 doesn't inline the call. – Slug 29/11, 2016 at 18:45

@Daniel Thanks for pointing that out, I never would have a compiler expected to do this. – Bifurcate 29/11, 2016 at 20:45

Yes, function objects might lead to faster code. But the only way to ensure that is to benchmark.

The documentation says: "GCC may still be unable to inline a function for many reasons; the -Winline option may be used to determine if a function has not been inlined and why not."
Of course it will depend on compiler, version, flags, etc. Sometimes inlining can be counter-productive (code bloat etc.), so each compiler has its own set of rules to decide whether a function should be inlined. By the way, the inline keyword is only a hint, and some libraries such as eigen have a hard time to enforce inlining.

Mourner answered 29/11, 2016 at 20:21 Comment(0)

Recommended topics

Hot tags