Why does the C++ linker allow undefined functions?
Asked Answered
A

3

31

This C++ code, perhaps surprisingly, prints out 1.

#include <iostream>

std::string x();

int main() {

    std::cout << "x: " << x << std::endl;
    return 0;
}

x is a function prototype, which seems to be viewed as a function pointer, and C++ Standard section 4.12 Boolean conversions says:

4.12 Boolean conversions [conv.bool] 1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be converted to a prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true. For direct-initialization (8.5), a prvalue of type std::nullptr_t can be converted to a prvalue of type bool; the resulting value is false.

However, x is never bound to a function. As I would expect, the C linker doesn't allow this. However in C++ this isn't a problem at all. Can anyone explain this behavior?

Arrester answered 25/11, 2014 at 20:56 Comment(12)
It's an ODR violation for which no diagnostic is required, meaning that your code has UB.Ingeborgingelbert
@Ingeborgingelbert Ill-formed, not UB.Veliavelick
@LightnessRacesinOrbit It's ill-formed NDR, so per [intro.compliance]/2 ("If a program contains a violation of a rule for which no diagnostic is required, this International Standard places no requirement on implementations with respect to that program.") it is essentially UB ("behavior for which this International Standard imposes no requirements", [defns.undefined]).Ingeborgingelbert
@Ingeborgingelbert Meh, I suppose so. Makes me wonder why they bother making a distinction between "ill-formed, no diagnostic required" and "the behaviour is undefined" in the first place, though. I'm sure there's a question about this somewhere...Veliavelick
@LightnessRacesinOrbit I think that's a special category for ODR violationsAnaptyxis
@MattMcNabb: It pops up in a few places actually, some having nothing even remotely to do with the ODR (such as mixing user-defined literal suffices in a sequence of concatenated string literals).Veliavelick
@LightnessRacesinOrbit I guess the difference is that "ill-formed, no diagnostic required" means the entire program has UB; whereas other forms of UB tend to not be triggered until the related code is executedAnaptyxis
@MattMcNabb: That may be itVeliavelick
@MattMcNabb That sounds wrong. If a program contains UB, the behavior of the complete program is undefined, including all bits before that construct is engendered and compile-time.Clemenceau
@Clemenceau That's patently untrue, most types of UB are only triggered by their statement being encountered. (e.g. int f() { return 1 / 0; } is OK as long as f() is never called).Anaptyxis
@MattMcNabb Yeah, but if main does call it, main (that is, the program) doesn't contain UB.Clemenceau
In many cases ill-formed NDR is for cases that theoretically could be diagnosed at compile time (but takes such an excessive amount of effort that it isn't worth it), while UB is for cases that may be impossible to diagnose at compile time.Ingeborgingelbert
A
29

What's happening here is that the function pointer is implicitly converted to bool. This is specified by [conv.bool]:

A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true

where "null pointer value" includes null function pointers. Since the function pointer obtained from decay of a function name cannot be null, this gives true. You can see this by including << std::boolalpha in the output command.

The following does cause a link error in g++: (int)x;


Regarding whether this behaviour is permitted or not, C++14 [basic.odr.ref]/3 says:

A function whose name appears as a potentially-evaluated expression is odr-used if it is the unique lookup result or the selected member of a set of overloaded functions [...]

which does cover this case, since x in the output expression is looked up to the declaration of x above and that is the unique result. Then in /4 we have:

Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required.

so the program is ill-formed but no diagnostic is required, meaning that the program's behaviour is completely undefined.

Incidentally this clause implies that no link error is required for x(); either, however from a quality-of-implementation angle; that would be silly. The course that g++ has chosen here seems reasonable to me.

Anaptyxis answered 25/11, 2014 at 21:9 Comment(9)
It is. See my answer.Clemenceau
I see that you've quoted N3797, which has a sexier wording than N3337. Will adjust that. :o)Clemenceau
@Clemenceau N3936 actually (which is identical to C++14 afaik)Anaptyxis
Why didn't you use N4140? (Just curious)Clemenceau
@Clemenceau have you got a download link? (as suggested by the SO C++ document page , authentication is required from the CWG page)Anaptyxis
@Clemenceau ty. I guess that should be added to the SO document compilationAnaptyxis
@Columbo: Quoting working drafts to prove standard behaviours is not very useful, as they can and do change significantly between standards, with bits added and removed in the meantime. It's always better to quote standards or, if needs be, the last working draft that editorially became a standard. That's what Matt did (yes, N3936 is effectively C++14).Veliavelick
@LightnessRacesinOrbit N4140 contains only editorial changes compared to N3936, and one of those changes is particularly nice for quoting (numbered bullets).Ingeborgingelbert
@Ingeborgingelbert Yes but in general, why would you push somebody to throw aside actual International Standard wording in favour of a draft, when the intention is to quote a standard?Veliavelick
S
14

X doesn't need to be "bound" to a function, because you stated in your code that such function exists. So compiler can safely assume, that the address of this function must not be NULL. For that to be possible, you'd have to declare the function to be a weak symbol, and you didn't. Linker did not protest, because you never call your function (you never use its actual address), so it sees no problem.

Sectorial answered 25/11, 2014 at 20:59 Comment(4)
Sure, the compiler can assume that. But the linker actually links it and produces a working executable.Arrester
1 isn't the address of the function though. (If you add in x1, x2, x3 etc. they all get 1)Anaptyxis
@Arrester - See the edited answer - linker never sees your x symbol, because the compiler never uses it - it optimized this "test", because according to language rules it is always true. This would work the same way if you'd have an extern variable and would test it's address.Sectorial
@FreddieChopin: what does it mean that "you'd have to declare the function to be a weak symbol". I didn't understand it. what is this weak symbol?Yoke
C
9

[basic.def.odr]/2:

A function whose name appears as a potentially-evaluated expression is odr-used if it is the unique lookup result or the selected member of a set of overloaded functions (3.4, 13.3, 13.4), unless it is a pure virtual function and its name is not explicitly qualified.

Hence, strictly speaking, the code odr-uses the function and therefore requires a definition.
But modern compilers will realize that the functions exact address is not actually relevant for the behavior of the program, and will thus elide the use and not require a definition.

Also note what [basic.def.odr]/3 specifies:

Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required.

An implementation is not obliged to halt compilation and issue an error message (=diagnostic). It can do what it considers best. In other words, any action is allowed and we have UB.

Clemenceau answered 25/11, 2014 at 21:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.