Why does defining inline global function in 2 different cpp files cause a magic result?
Asked Answered
K

4

30

Suppose I have two .cpp files file1.cpp and file2.cpp:

// file1.cpp
#include <iostream>

inline void foo()
{
    std::cout << "f1\n";
}

void f1()
{
    foo();
}

and

// file2.cpp
#include <iostream>

inline void foo()
{
   std::cout << "f2\n";
}

void f2()
{
    foo();
}

And in main.cpp I have forward declared the f1() and f2():

void f1();
void f2();

int main()
{
    f1();
    f2();
}

Result (doesn't depend on build, same result for debug/release builds):

f1
f1

Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.

Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:

f1
f2
Kryska answered 2/5, 2017 at 11:7 Comment(1)
You are breaking the One Definition Rule (ODR). see en.cppreference.com/w/cpp/language/definitionApeldoorn
W
40

This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:

3.2 One definition rule

...

  1. There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then

6.1 each definition of D shall consist of the same sequence of tokens; [...]

This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.

Windflower answered 2/5, 2017 at 11:19 Comment(1)
Note: the rules for the ODR are incredibly specific and designed to prevent you from doing anything too clever which might confuse the poor linker. Using static or anonymous namespaces resolves this.Atal
P
30

The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.

Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.

If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.


Relevant standardese for reference:

An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly the same definition in every case (3.2). [...]

7.1.2/4 in N4141, emphasize mine.

Polak answered 2/5, 2017 at 11:14 Comment(3)
Aren't there three translation units in this example? A translation unit is the basic unit of compilation in C++. It consists of the contents of a single source file, plus the contents of any header files directly or indirectly included by it, minus those lines that were ignored using conditional preprocessing statements. So I think it should have created two object files for file1 and file2 with different bodies of foo and then link it all together. Would not that be more correct?Facelift
@AlexPetrenko Sorry, I don't get your point. Of course there are three translation units, but I don't see how that's relevant. All that matters is that there are 2 different definitions of the same inline function.Polak
I think you edited your post a little bit and now it's all clear. Thanks anyway :)Facelift
E
12

As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.

In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.

This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.

It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.

However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.

The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.

Another really nasty example of this:

// A.cpp
struct Helper {
  std::vector<int> foo;
  Helper() {
    foo.reserve(100);
  }
};
// B.cpp
struct Helper {
  double x, y;
  Helper():x(0),y(0) {}
};

methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.

The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).

In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.

At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.

So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.

If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.

I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.

Eisenberg answered 2/5, 2017 at 13:52 Comment(7)
Thanks for another example. The answer is quite helpful!Kryska
"In general, you should consider putting everything in a source file in an anonymous namespace." Eh?Cascio
"I've caught exactly this bug in production multiple times." I caught it once when two distinct shared libraries were conflicting w.r.t. their unnamespaced free functions. Nasty.Cascio
@BoundaryImposition namespace {} is an anonymous namespace. If your struct Helper was in namespace {}, it would have a different "real name" in both cpp files.Eisenberg
@Yakk: I know what an unnamed namespace is, thank you. Most things in .cpp files are designed to be accessed from other translation units, no? If you put everything in your .cpp files into unnamed namespaces, very few things are going to work.Cascio
@BoundaryImposition yes, I mean other than the stuff you fully qualify as part of the namespace you are implementing.Eisenberg
You also have two ODR-violation definitions of Helper the class, never mind the constructor.Eudocia
O
-3

POINT OF CLARIFICATION:

Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

Ornithischian answered 2/5, 2017 at 13:42 Comment(3)
"it only applies if both sources are compiled together" No, that's wrong. It always applies as long as both TUs are used by the same program.Polak
@Baum mit Augen - Perhaps we are using the word 'applies' slightly differently. I agree that the letter of the law is violated whether the sources are compiled separately or together. But as a practical matter, I don't see how separate compilation could fail to produce multiple instances of 'foo()' in the separate .obj files, which has to be resolved by the linker. There is a gap between 'there is a rule' and 'the compiler can enforce the rule'.Ornithischian
It seems that this answer is true as far as it goes, but I don't understand why this is relevant. I understand you to be claiming that, since there are multiple translation units, each has a definition of foo in its respective object file, and it is the linker's job to resolve them. Okay. So what? What bearing does that have on the question or its answer, and why did this need to be clarified from what the other answers had already said?Coltish

© 2022 - 2024 — McMap. All rights reserved.