Do distinct functions have distinct addresses?
Asked Answered
W

4

61

Consider these two functions:

void foo() {}
void bar() {}

is it guaranteed that &foo != &bar?

Similarly,

template<class T> void foo() { }

is it guaranteed that &foo<int> != &foo<double>?


There are two linkers I know of that fold function definitions together.

MSVC aggressively COMDAT folds functions, so two functions with the same implementation can be turned into one function. As a side effect, the two functions share the same address. I was under the impression that this was illegal, but I cannot find where in the standard it is made illegal.

The Gold linker also folds functions, with both a safe and all setting. safe means that if a function address is taken, it is not folded, while all folds even if the address is taken. So gold's fold safe behaves as-if functions have distinct addresses.

While folding might be unexpected, and there is code that relies on distinct (identical implementation) functions having different addresses (so it can be dangerous to fold), is it actually illegal under the current C++ standard? (C++14 at this point) (Naturally as-if safe folding is legal)

Woodford answered 23/10, 2014 at 17:21 Comment(19)
I wonder if such an optimization/transformation can change a visible output of the code. The question itself could be then transformed. If the output could change, unless it's explicitly allowed, it's not legal. Otherwise it is legal.Grenoble
I can't find anything in the standard that specifies distinct addresses for functions. That might just mean I wasn't looking hard enough, though.Headway
I'm sure I've seen this question beforeBeilul
Related : Why do two functions have the same address?Dethrone
The gold linker has: --icf [none,all,safe] Identical Code Folding. '--icf=safe' Folds ctors, dtors and functions whose pointers are definitely not taken.Hypallage
@Marc nod -- gold and MSVC both have that folding, as far as I know, but MSVC doesn't have the safe setting. Really, you could always fold, except when you ask for an address possibly spawn extra copies (or use a jump table). I'm wondering if MSVC is technically legal.Woodford
@Yakk gold has the safe setting because they know all is illegal ( research.google.com/pubs/pub36912.html ). But I don't want to participate in this legal dispute.Hypallage
@MarcGlisse: Kudos for finding that, just one observation: They never say relying on any two functions having different addresses is allowed. Quote: "ICF can be unsafe, however, as it can change the run-time behaviour of code that relies on each function having a unique address."Gusgusba
@MarcGlisse I folded your link to that gold paper into the original post.Woodford
@LightnessRacesinOrbit: Yes, you have seen it before: https://mcmap.net/q/22844/-is-an-implementation-allowed-to-site-two-identical-function-definitions-at-the-same-address-or-not-duplicate Though with fewer upvotes, answers and controversy. ;-)Gusgusba
@Deduplicator: Hah, there we goBeilul
Isn't this a dupe, then?Beilul
@lightness yep. Is there a way to migrate answers? I left this one unchecked as the standard seems ambiguous at this time. If we cannot migrate, do we dupe mine to yours, or yours to mine (causality breach!) in order to keep the interesting answers? (the checked answer on yours seems covered below).Woodford
@Yakk: Mods merge dupes under some circumstances. Whether these two are close enough for that I'm not sure. You've got some good answers here so I'll just dupe mine to yours.Beilul
@LightnessRacesinOrbit any flag for merging? From my experience with merges this one seems reasonable, the answer to your question definitely fits in very well here.Gastrotrich
Related to: Is Visual Studio 2013 optimizing correctly in the presence of /OPT:ICF?.Gastrotrich
@ShafikYaghmour somewhat -- that question has the functions being the same only because two addresses-of const data that are logically distinct are being folded into one: the standard is very clear that distinct objects have distinct addresses, so that is illegal. Only after that illegal fold do the two functions implementations become the same, and function-level ICF causes the two functions to have the same value. This question is "is the function-level fold legal".Woodford
@Yakk sure, it seems like they may be part of the same optimization issue, which is why I mentioned it. They are not identical issues.Gastrotrich
See my update, Jan Hubička agrees with @MarcGlisse and in hindsight I see this differently now and it makes sense that this is not conforming behavior.Gastrotrich
G
3

5.10 Equality operators [expr.eq]

1 The == (equal to) and the != (not equal to) operators group left-to-right. The operands shall have arithmetic, enumeration, pointer, or pointer to member type, or type std::nullptr_t. The operators == and != both yield true or false, i.e., a result of type bool. In each case below, the operands shall have the same type after the specified conversions have been applied.
2 If at least one of the operands is a pointer, pointer conversions (4.10) and qualification conversions (4.4) are performed on both operands to bring them to their composite pointer type (Clause 5). Comparing pointers is defined as follows: Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.

Let's take the last bit-for-bit:

  1. Two null pointers compare equal.
    Good for your sanity.
  2. Two pointers to the same function compare equal.
    Anything else would be extremely surprising.
    It also means that only one out-of-line version of any inline-function may ever have its address taken, unless you want to make function-pointer comparisons prohibitively complicated and expensive.
  3. Both represent the same address.
    Now that one is what it's all about. Dropping this and reducing if and only if to a simple if would leave it to interpretation, but that's a clear mandate to make any two functions identical, as long as it does not otherwise change observable behavior of a conformant program.
Gusgusba answered 23/10, 2014 at 18:18 Comment(0)
G
33

It looks like defect report 1400: Function pointer equality deals with this issue and seems to me to say that it is okay to do this optimization but as comments indicate, there is disagreement. It says (emphasis mine):

According to 5.10 [expr.eq] paragraph 2, two function pointers only compare equal if they point to the same function. However, as an optimization, implementations are currently aliasing functions that have identical definitions. It is not clear whether the Standard needs to deal explicitly with this optimization or not.

and the response was:

The Standard is clear on the requirements, and implementations are free to optimize within the constraints of the “as-if” rule.

The question is asking about two issues:

  • Is it okay for these pointers to be considered equal
  • Is it okay to coalesce the functions

Based on comments I see two interpretations of the response:

  1. This optimization is ok, the standard gives the implementation this freedom under the as-if rule. The as-if rule is covered in section 1.9 and means the implementation only has to emulate the observable behavior with respect to the requirements of the standard. This is still my interpretation of the response.

  2. The issue is at hand is completely ignored and the statement merely says no adjustment to the standard is required because clearly the as-if rules covers this but the interpretation is left as an exercise to the reader. Although I acknowledge due to the terseness of the response I can not dismiss this view, it ends up being a totally unhelpful response. It also seems inconsistent with the responses in the other NAD issues which as far as I can tell point out issue if they exist.

What the draft standard says

Since we know we are dealing with the as-if rule, we can start there and note that section 1.8 says:

Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses.4

and note 4 says:

Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference

but a note from that section says:

A function is not an object, regardless of whether or not it occupies storage in the way that objects do

although it is not normative, the requirements for an object laid out in paragraph 1 do not make sense in the context of a function and so it is consistent with this note. So we are explicitly restricted from aliasing objects with some exceptions but not such restriction applies to functions.

Next we have section 5.10 Equality operators which says (emphasis mine):

[...]Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.

which tells us two pointers are equal if they are:

  • Null pointers
  • Point to the same function
  • Represent the same address

The or both represent the same address seems to give enough latitude to allow a compiler to alias two different functions and does not require pointers to different functions to compare unequal.

Observations

Keith Thompson has made some great observations that I feel are worth adding to the answer since they get to core issues involved, he says:

If a program prints the result of &foo == &bar, that's observable behavior; the optimization in question changes the observable behavior.

which I agree with and if we could shows that there is a requirement for the pointers to be unequal that would indeed violate the as-if rule but so far we can not show that.

and:

[...]consider a program that defines empty function and uses their addresses as unique values (think about SIG_DFL, SIG_ERR, and SIG_IGN in <signal.h> / <csignal>). Assigning them the same address would break such a program

As I noted in my comment the C standard requires these macros to generate distinct values, from 7.14 in C11:

[...]which expand to constant expressions with distinct values that have type compatible with the second argument to, and the return value of, the signal function, and whose values compare unequal to the address of any declarable function[...]

So although this case is covered perhaps there are other cases that would make this optimization dangerous.

Update

Jan Hubička a gcc developer wrote a blog post Link time and inter-procedural optimization improvements in GCC 5, code folding was one of many topics he covered.

I asked him to comment on whether folding identical functions to the same address was conforming behavior or not and he says it is not conforming behavior and indeed such an optimization would break gcc itself:

It is not conforming to turn two functions to have same address, so MSVC is quite aggressive here. Doing so, for example, breaks GCC itself because to my surprise address compare is done in the precompiled headers code. It works for many other projects, including Firefox.

In hindsight, after months more of reading defect reports and thinking about optimization issues, I am biased towards a more conservative reading of the committee's response. Taking the address of a function is observable behavior and therefore folding identical functions would violate the as-if rule.

Update 2

Also see this llvm-dev discussion: Zero length function pointer equality:

This is a well-known conformance-violating bug in link.exe; LLVM should not be making things worse by introducing a similar bug itself. Smarter linkers (for example, I think both lld and gold) will do identical function combining only if all but one of the function symbols is only used as the target of calls (and not to actually observe the address). And yes, this non-conforming behavior (rarely) breaks things in practice. See this research paper.

Gastrotrich answered 23/10, 2014 at 18:44 Comment(21)
close, but not quite: it is allowed for two different functions to share the same binary code (under as-if), but it may not be ok for two different function's addresses to be compared and be equal. One way to approach that is to require "stub" noops or jmps at the front of the function, so each gets a different address, but the same body (as I believe some non-MSVC COMDAT folding equivalents do)Woodford
@Yakk: There is no requirement for them to compare unequal. If there is, please quote it.Gusgusba
@Yakk perhaps I am missing a subtly here, how do you understand the term alias with respect to function pointers?Gastrotrich
@Shafik I'm saying that defect report resolution seems to be talking about the optimization where the two functions share the same body, but when you take pointers to them you get different values. It does not clearly cover the case covered in my question.Woodford
@Gusgusba Reading that clause that way (that you can say two distinct functions have the same address) means there is no requirement that any two function pointers compare unequal. The compiler can turn fptra == fptrb and make it !fptra == !fptrb (ie, all non-null function pointers are equivalent). That seems strange.Woodford
@Yakk as with most defect reports the committees response is indeed terse, I can see see your position but I definitely disagree let's see if anyone with more definitive knowledge comes by.Gastrotrich
@Yakk: That would round counter to the common definition of address, which holds sway if there is no different definition in the standard.Gusgusba
I find the committee response not merely terse, but opaque. A footnote in the standard says that "an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program". If a program prints the result of &foo == &bar, that's observable behavior; the optimization in question changes the observable behavior. My reading of the response is that the pointers must compare unequal, but I suspect that's not what was meant.Scriabin
@KeithThompson that is a good point, I added your observation to my answer, please let me know if I misunderstood.Gastrotrich
@ShafikYaghmour: I have no problem with the update. In my opinion, since they're distinct functions in the abstract machine, they're distinct functions as far as the "==" operator on their addresses is concerned. I wouldn't oppose permitting the addresses to be equal, but IMHO that would require a change to the wording of the standard, not just a statement from the committee that the standard is already clear enough.Scriabin
@KeithThompson can you clarify: since they're distinct functions in the abstract machine, they're distinct functions as far as the "==" operator what part of the standard covers that or is that a conceptual idea that is implied but is not explicit? Johanes implied something similar but he deleted the comment and decided not to follow up.Gastrotrich
"The response says it is allowed by the as-if rule" -- No, it just plain does not. It says an optimisation is valid if allowed by the as-if rule, and invalid if not allowed by the as-if rule, without addressing whether the as-if rule allows that particular optimisation.Orlina
@hvd that is not an unfair interpretation(like I mentioned it is terse). I just find it incredibly silly and totally unhelpful response if indeed the optimization is not valid. Let me think about rephrasing my answer, I won't be back online immediately. It does not change my belief that it is valid, as I said with Keith's observation it does indeed change observable behavior but there is no requirements that the pointers compare unequal, if we could find that requirement the optimization would clearly be invalid.Gastrotrich
@ShafikYaghmour: It's not explicit, but I see it as common sense. Obviously others disagree. The question is whether two distinctly defined functions can be "the same function". Can an optimizing compiler that shares code for two functions cause them to be "the same function", in a manner visible to user code? Can it do the same for distinct objects? Practically, consider a program that defines empty function and uses their addresses as unique values (think about SIG_DFL, SIG_ERR, and SIG_IGN in <signal.h> / <csignal>). Assigning them the same address would break such a program.Scriabin
@KeithThompson interesting point, that gives me some food for thought. Although note that for that case the standard requires The macros defined are ... which expand to constant expressions with distinct values from section 7.14 of C11 and C99 and so aliasing those would definitely violate the as-if rule.Gastrotrich
@KeithThompson: It would seem your interpretation would allow a function to be merged with a distinct-but-identical functions whose address is not taken, but every function whose address is taken may only be merged if their addresses compare as distinct (on an architecture where pointers have more bits than hardware addresses, it may be possible for two pointers to represent the same physical address but still appear as distinct). If the intention is that functions whose address is taken must yield distinct addresses, I wonder why it doesn't say so?Kilroy
@KeithThompson: Also note that an optimizing compiler can share code for two function bodies but still give them distinct addresses. One function's address is the start of the body, and one function's address points to a "NOP" directly before the function body. Of course it does complicate linking.Tumble
@MooingDuck: Another possibility would be that an implementation could use memory-mapping tricks so that multiple memory pages point to the same physical RAM containing the code. I don't know of any desktop systems that would do such a thing, but such hacks are hardly unknown in the embedded-systems world.Kilroy
@KeithThompson: Well, the question there is whether it's actually the program or the compiler that's broken, and I still tend to the former. The implementation obviously can have extra magic for those symbols you named... +Shafik: Regarding Jan Hubička's quote, he might just be mistaken. Now if he could prove it with chapter and verse... Anyway, I don't think MS for one will want that folding to be outlawed, it would either make them less conformant or less efficient, neither of which they will welcome.Gusgusba
After looking at CWG's records on issue 1400, I'm pretty confident that they meant that the aggressive version of the optimization is invalid.Sternick
@Sternick agreed and that is what I concluded at the end.Gastrotrich
P
11

Yes. From the standard (§5.10/1): "Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address"

Once they have been instantiated, foo<int> and foo<double> are two different functions, so the above applies to them as well.

Phonon answered 23/10, 2014 at 17:34 Comment(22)
That doesn't prevent them from representing the same address but pointing to different functions.Headway
"both point to the same function, or both represent the same address" Sounds confusing... and circular. How can one define "equality" in terms of "same address"? I mean, what does "same" mean here? equal, isn't it? Also, it doesn't say that two different function must have different/unequal addresses as well.Dethrone
@MikeSeymour: How does it not? The statement clearly says that if they point to different functions then the pointers cannot compare equal.Beilul
@LightnessRacesinOrbit: No it doesn't. The use of "or" means they'll compare equal if they represent the same address but don't both point to the same function.Headway
@MikeSeymour: I strongly interpret that to mean the same address of an object. The phrase covers the three "modes" of pointer values. If so I would consider this an editorial defect. Otherwise it's pretty nonsensical.Beilul
Is it guaranteed that foo<int> and foo<double> are different functions if they don't depend on the template argument?Grenoble
(ptra==ptrb) iff ( ( ptra is null AND ptrb is null) OR (ptra and ptrb both refer to the same function) OR ( ptra and ptrb both refer to the same address). This implies that if ptra and ptrb refer to a different address but the same function, they must compare equal (whatever that means). What does address mean in the standard? X iff (A or B or C) is not the same as X iff B. However, if we can eliminate A or C as being possibilities, it is.Woodford
@Nawaz "the same address" is relevant for objects, where the object representation requires them to have an address. And it is weasel wording to avoid the issue where a one past the end of the array ends up pointing to another object. It's the "both point to the same function" which is relevant here.Phonon
@MikeSeymour Functions don't have addresses (in the standard).Phonon
@JamesKanze: I'd love to say that. Actually what I mean is that I'd love to be able to prove that.Beilul
@LightnessRacesinOrbit You're on the right track. In C++, only "objects" have addresses. You can have a pointer to a function, but it is not an address.Phonon
@JamesKanze: Except when they do. See for example 13.4, "Address of overloaded function".Headway
@LightnessRacesinOrbit You won't find a statement that functions don't have addresses. However, only objects occupy storage, and only things which occupy storage can have addresses. See too §5.3.1/3; the unary & operator on a function returns a "pointer", not its address (unlike the case for objects).Phonon
@Grenoble That's certainly the intent. Each instantiation creates a new function. (We discussed this in the committee. With a representative from Microsoft present.)Phonon
I am bit skeptical, it seems like the phrase or both represent the same address allows the optimization being seen. Perhaps that was not the intent but I could see how it could be read that way.Gastrotrich
@Yakk See §1.8/6, which defines the address of an object. There is nothing about the address of a function anywhere, at least that I can find; there are "pointers" to functions, but a pointer to a function is just that; it's not an address, but anything the compiler wants to allow it to find the function.Phonon
@JamesKanze: You don't have such a nice quote for what "address of a function" means, but that does not give you licence to just ignore it.Gusgusba
@MikeSeymour That's an interesting title; the word "address" never appears in the section. I don't think section names are normative (they don't specify anything), but the name does seem to be carelessly chosen, given the rest of the standard, including the contents of this section.Phonon
@Gusgusba The standard defines all of the terms it uses. As far as the standard is concerned, despite the careless wording of one section title (which uses the word address in its everyday sense, and not in the sense the standard uses it), functions simply don't have addresses.Phonon
You know, it makes perfect sense for all three options to be present; they are different and only together give all guarantees one would want (and not too few).Gusgusba
Comments are not for extended discussion; this conversation has been moved to chat.Housebound
@Deduplicator: The section title evidently should be "unary & member functions" because officially, the "unary & operator" in C++ is not named "address-of operator"... but the section title is in fact referring to the operator. "address of" in that section title is an indivisible phrase.Surname
S
11

So the problematic part is clearly the phrase or both represent the same address (3.9.2).

IMO this part is clearly there to define the semantics for object pointer types. And only for object pointer types.

The phrase references section 3.9.2, which means we should look there. 3.9.2 talks (among others) about the addresses that object pointers represent. It does not talk about the addresses that function pointers represent. Which, IMO, leaves just two possible interpretations:

1) The phrase simply does not apply to function pointers. Which leaves just the two null pointers and two pointers to the same function comparing equal, which is what probably most of us expected.

2) The phrase does apply. Since it's referring to 3.9.2, which says nothing about the addresses that function pointers represent, we may make any two function pointers compare equal. Which is very unexpected and of course renders comparing function pointers utterly useless.

So, while technically an argument could be made that (2) is a valid interpretation, IMO it's not a meaningful interpretation and thus should be disregarded. And since not everyone seems to agree on this, I also think that a clarification in the standard is needed.

Sympathize answered 24/10, 2014 at 18:14 Comment(0)
G
3

5.10 Equality operators [expr.eq]

1 The == (equal to) and the != (not equal to) operators group left-to-right. The operands shall have arithmetic, enumeration, pointer, or pointer to member type, or type std::nullptr_t. The operators == and != both yield true or false, i.e., a result of type bool. In each case below, the operands shall have the same type after the specified conversions have been applied.
2 If at least one of the operands is a pointer, pointer conversions (4.10) and qualification conversions (4.4) are performed on both operands to bring them to their composite pointer type (Clause 5). Comparing pointers is defined as follows: Two pointers compare equal if they are both null, both point to the same function, or both represent the same address (3.9.2), otherwise they compare unequal.

Let's take the last bit-for-bit:

  1. Two null pointers compare equal.
    Good for your sanity.
  2. Two pointers to the same function compare equal.
    Anything else would be extremely surprising.
    It also means that only one out-of-line version of any inline-function may ever have its address taken, unless you want to make function-pointer comparisons prohibitively complicated and expensive.
  3. Both represent the same address.
    Now that one is what it's all about. Dropping this and reducing if and only if to a simple if would leave it to interpretation, but that's a clear mandate to make any two functions identical, as long as it does not otherwise change observable behavior of a conformant program.
Gusgusba answered 23/10, 2014 at 18:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.