How can I reliably get an object's address when operator& is overloaded?
Asked Answered
A

5

173

Consider the following program:

struct ghost
{
    // ghosts like to pretend that they don't exist
    ghost* operator&() const volatile { return 0; }
};

int main()
{
    ghost clyde;
    ghost* clydes_address = &clyde; // darn; that's not clyde's address :'( 
}

How do I get clyde's address?

I'm looking for a solution that will work equally well for all types of objects. A C++03 solution would be nice, but I'm interested in C++11 solutions too. If possible, let's avoid any implementation-specific behavior.

I am aware of C++11's std::addressof function template, but am not interested in using it here: I'd like to understand how a Standard Library implementor might implement this function template.

Antoine answered 27/6, 2011 at 14:39 Comment(16)
@jalf: That strategy is acceptable, but now that I've punched said individuals in the head, how do I work around their abominable code? :-)Antoine
@jalf Uhm, sometimes you need to overload this operator, and return a proxy object. Though I can’t think of an example just now.Anaemia
@Konrad: me either. If you need that, I'd suggest that a better option might be to rethink your design, because overloading that operator just causes too many problems. :)Ikeda
See also this answer.Tusk
@Konrad: In roughly 20 years of C++ programming I have once attempted to overload that operator. That was at the very beginning of those twenty years. Oh, and I failed to make that usable. Consequently, the operator overloading FAQ entry says "The unary address-of operator should never be overloaded." You'll get a free beer the next time we meet if you can come up with a convincing example for overloading this operator. (I know you're leaving Berlin, so I can safely offer this :))Tusk
CComPtr<> and CComQIPtr<> have an overloaded operator&Weber
@Simon: but the important question is should they have an overloaded operator&?Ikeda
Well, it allows pointers to them to be passed to functions that expect a pointer to the contained type... But indeed, I'd return a proxy object that is convertible to T ** and CComPtr<T> *.Weber
Don't do it like this. It will trigger an operator char&().Antwanantwerp
@Simon Richter: I till remember spending a day or so debugging and fixing a problem triggered by this. GAAAH! --- the operator & should use an interface ** OutPtr() / interface ** InOutPtr() instead, that would make it explicit in the call (with acceptable overhead)Urmia
Here're two very similar questions https://mcmap.net/q/56786/-if-an-operator-is-overloaded-for-a-c-class-how-could-i-use-a-default-operator-instead/57428 and https://mcmap.net/q/56097/-most-portable-and-reliable-way-to-get-the-address-of-variable-in-c/57428Creatural
@curiousguy: Many interesting questions in life tend to be about unpractical things. That said, this question is certainly a practical one for anyone writing a C++ Standard Library implementation.Antoine
@JamesMcNellis "That said, this question is certainly a practical one for anyone writing a C++ Standard Library implementation" for what?Sutra
@curiousguy: std::addressof must be able to obtain the address of an object, even if the object is of a type that overloads arbitrary operators, including conversion operators and the unary &. Further, the Standard Library containers must be instantiable and usable with those perverse types as well (this requirement is new in C++11; it was not present in C++98/03).Antoine
OTOH: "Numeric type requirements" [numeric.requirements] "it does not overload unary operator&."Sutra
@SimonRichter how is CCom*** to be considered something that doesn't need its design rethought???Excellence
A
102

Update: in C++11, one may use std::addressof instead of boost::addressof.


Let us first copy the code from Boost, minus the compiler work around bits:

template<class T>
struct addr_impl_ref
{
  T & v_;

  inline addr_impl_ref( T & v ): v_( v ) {}
  inline operator T& () const { return v_; }

private:
  addr_impl_ref & operator=(const addr_impl_ref &);
};

template<class T>
struct addressof_impl
{
  static inline T * f( T & v, long ) {
    return reinterpret_cast<T*>(
        &const_cast<char&>(reinterpret_cast<const volatile char &>(v)));
  }

  static inline T * f( T * v, int ) { return v; }
};

template<class T>
T * addressof( T & v ) {
  return addressof_impl<T>::f( addr_impl_ref<T>( v ), 0 );
}

What happens if we pass a reference to function ?

Note: addressof cannot be used with a pointer to function

In C++ if void func(); is declared, then func is a reference to a function taking no argument and returning no result. This reference to a function can be trivially converted into a pointer to function -- from @Konstantin: According to 13.3.3.2 both T & and T * are indistinguishable for functions. The 1st one is an Identity conversion and the 2nd one is Function-to-Pointer conversion both having "Exact Match" rank (13.3.3.1.1 table 9).

The reference to function pass through addr_impl_ref, there is an ambiguity in the overload resolution for the choice of f, which is solved thanks to the dummy argument 0, which is an int first and could be promoted to a long (Integral Conversion).

Thus we simply returns the pointer.

What happens if we pass a type with a conversion operator ?

If the conversion operator yields a T* then we have an ambiguity: for f(T&,long) an Integral Promotion is required for the second argument while for f(T*,int) the conversion operator is called on the first (thanks to @litb)

That's when addr_impl_ref kicks in. The C++ Standard mandates that a conversion sequence may contain at most one user-defined conversion. By wrapping the type in addr_impl_ref and forcing the use of a conversion sequence already, we "disable" any conversion operator that the type comes with.

Thus the f(T&,long) overload is selected (and the Integral Promotion performed).

What happens for any other type ?

Thus the f(T&,long) overload is selected, because there the type does not match the T* parameter.

Note: from the remarks in the file regarding Borland compatibility, arrays do not decay to pointers, but are passed by reference.

What happens in this overload ?

We want to avoid applying operator& to the type, as it may have been overloaded.

The Standard guarantees that reinterpret_cast may be used for this work (see @Matteo Italia's answer: 5.2.10/10).

Boost adds some niceties with const and volatile qualifiers to avoid compiler warnings (and properly use a const_cast to remove them).

  • Cast T& to char const volatile&
  • Strip the const and volatile
  • Apply the & operator to take the address
  • Cast back to a T*

The const/volatile juggling is a bit of black magic, but it does simplify the work (rather than providing 4 overloads). Note that since T is unqualified, if we pass a ghost const&, then T* is ghost const*, thus the qualifiers have not really been lost.

EDIT: the pointer overload is used for pointer to functions, I amended the above explanation somewhat. I still do not understand why it is necessary though.

The following ideone output sums this up, somewhat.

Aboulia answered 27/6, 2011 at 15:27 Comment(14)
"What happens if we pass a pointer ?" part is incorrect. If we pass a pointer to some type U the addressof function the type 'T' is inferred to be 'U*' and addr_impl_ref will have two overloads: 'f(U*&, long)' and 'f(U**,int)', obviously the first one will be selected.Schiffman
@Konstantin: right, I had thought that the two f overloads where function templates, whereas they are regular member functions of a template class, thanks for pointing it out. (Now I just need to figure out what is the use of the overload, any tip ?)Aboulia
This is a great, well-explained answer. I kind of figured there was a bit more to this than just "cast through char*." Thank you, Matthieu.Antoine
@James: I have had much help from @Konstantin who would strike my head with a stick any time I made a mistake :DAboulia
@Matthieu: Did I? :D Probably we are just interested in similar questions here, nothing personal. :)Schiffman
Why would it need to work around types that have a conversion function? Would it not prefer the exact match over invoking any conversion function to T*? EDIT: Now I see. It would, but with the 0 argument it would end up in a criss-cross, so would be ambiguous.Antwanantwerp
@James: :D @litb: there are two conversions we wish to avoid. The conversion to T* leads to an ambiguity and the conversion to T& may point to another object. The latter would really bite us, unnoticed (at compile-time).Aboulia
@Matthieu, no the conversion to T& can never happen because the argument is a T already. This is only to avoid the criss-cross.Antwanantwerp
@James https://mcmap.net/q/144763/-why-is-this-ambiguity-here/…Antwanantwerp
"then func is a reference to a function" Hug? There is no reference here!Sutra
In C++11 we can now just use std::addressofPipit
@paulm: Right! Edited as the first line.Aboulia
it can switch int and long ? static inline T * f( T & v, int) { return reinterpret_cast<T*>( &const_cast<char&>(reinterpret_cast<const volatile char &>(v))); } static inline T * f( T * v, long) { return v; }Daffodil
@zpeng: I am not quite sure, to be honest, since a pointer to reference is invalid and a reference to pointer is valid it seems to me it makes sense to privilege the T* function and thus force a conversion before access to the T&... but maybe I am just paranoid because I cannot think of a counter-example right now.Aboulia
A
106

Use std::addressof.

You can think of it as doing the following behind the scenes:

  1. Reinterpret the object as a reference-to-char
  2. Take the address of that (won’t call the overload)
  3. Cast the pointer back to a pointer of your type.

Existing implementations (including Boost.Addressof) do exactly that, just taking additional care of const and volatile qualification.

Anaemia answered 27/6, 2011 at 14:58 Comment(1)
I like this explanation better than the selected on as it can be readily understood.Footle
A
102

Update: in C++11, one may use std::addressof instead of boost::addressof.


Let us first copy the code from Boost, minus the compiler work around bits:

template<class T>
struct addr_impl_ref
{
  T & v_;

  inline addr_impl_ref( T & v ): v_( v ) {}
  inline operator T& () const { return v_; }

private:
  addr_impl_ref & operator=(const addr_impl_ref &);
};

template<class T>
struct addressof_impl
{
  static inline T * f( T & v, long ) {
    return reinterpret_cast<T*>(
        &const_cast<char&>(reinterpret_cast<const volatile char &>(v)));
  }

  static inline T * f( T * v, int ) { return v; }
};

template<class T>
T * addressof( T & v ) {
  return addressof_impl<T>::f( addr_impl_ref<T>( v ), 0 );
}

What happens if we pass a reference to function ?

Note: addressof cannot be used with a pointer to function

In C++ if void func(); is declared, then func is a reference to a function taking no argument and returning no result. This reference to a function can be trivially converted into a pointer to function -- from @Konstantin: According to 13.3.3.2 both T & and T * are indistinguishable for functions. The 1st one is an Identity conversion and the 2nd one is Function-to-Pointer conversion both having "Exact Match" rank (13.3.3.1.1 table 9).

The reference to function pass through addr_impl_ref, there is an ambiguity in the overload resolution for the choice of f, which is solved thanks to the dummy argument 0, which is an int first and could be promoted to a long (Integral Conversion).

Thus we simply returns the pointer.

What happens if we pass a type with a conversion operator ?

If the conversion operator yields a T* then we have an ambiguity: for f(T&,long) an Integral Promotion is required for the second argument while for f(T*,int) the conversion operator is called on the first (thanks to @litb)

That's when addr_impl_ref kicks in. The C++ Standard mandates that a conversion sequence may contain at most one user-defined conversion. By wrapping the type in addr_impl_ref and forcing the use of a conversion sequence already, we "disable" any conversion operator that the type comes with.

Thus the f(T&,long) overload is selected (and the Integral Promotion performed).

What happens for any other type ?

Thus the f(T&,long) overload is selected, because there the type does not match the T* parameter.

Note: from the remarks in the file regarding Borland compatibility, arrays do not decay to pointers, but are passed by reference.

What happens in this overload ?

We want to avoid applying operator& to the type, as it may have been overloaded.

The Standard guarantees that reinterpret_cast may be used for this work (see @Matteo Italia's answer: 5.2.10/10).

Boost adds some niceties with const and volatile qualifiers to avoid compiler warnings (and properly use a const_cast to remove them).

  • Cast T& to char const volatile&
  • Strip the const and volatile
  • Apply the & operator to take the address
  • Cast back to a T*

The const/volatile juggling is a bit of black magic, but it does simplify the work (rather than providing 4 overloads). Note that since T is unqualified, if we pass a ghost const&, then T* is ghost const*, thus the qualifiers have not really been lost.

EDIT: the pointer overload is used for pointer to functions, I amended the above explanation somewhat. I still do not understand why it is necessary though.

The following ideone output sums this up, somewhat.

Aboulia answered 27/6, 2011 at 15:27 Comment(14)
"What happens if we pass a pointer ?" part is incorrect. If we pass a pointer to some type U the addressof function the type 'T' is inferred to be 'U*' and addr_impl_ref will have two overloads: 'f(U*&, long)' and 'f(U**,int)', obviously the first one will be selected.Schiffman
@Konstantin: right, I had thought that the two f overloads where function templates, whereas they are regular member functions of a template class, thanks for pointing it out. (Now I just need to figure out what is the use of the overload, any tip ?)Aboulia
This is a great, well-explained answer. I kind of figured there was a bit more to this than just "cast through char*." Thank you, Matthieu.Antoine
@James: I have had much help from @Konstantin who would strike my head with a stick any time I made a mistake :DAboulia
@Matthieu: Did I? :D Probably we are just interested in similar questions here, nothing personal. :)Schiffman
Why would it need to work around types that have a conversion function? Would it not prefer the exact match over invoking any conversion function to T*? EDIT: Now I see. It would, but with the 0 argument it would end up in a criss-cross, so would be ambiguous.Antwanantwerp
@James: :D @litb: there are two conversions we wish to avoid. The conversion to T* leads to an ambiguity and the conversion to T& may point to another object. The latter would really bite us, unnoticed (at compile-time).Aboulia
@Matthieu, no the conversion to T& can never happen because the argument is a T already. This is only to avoid the criss-cross.Antwanantwerp
@James https://mcmap.net/q/144763/-why-is-this-ambiguity-here/…Antwanantwerp
"then func is a reference to a function" Hug? There is no reference here!Sutra
In C++11 we can now just use std::addressofPipit
@paulm: Right! Edited as the first line.Aboulia
it can switch int and long ? static inline T * f( T & v, int) { return reinterpret_cast<T*>( &const_cast<char&>(reinterpret_cast<const volatile char &>(v))); } static inline T * f( T * v, long) { return v; }Daffodil
@zpeng: I am not quite sure, to be honest, since a pointer to reference is invalid and a reference to pointer is valid it seems to me it makes sense to privilege the T* function and thus force a conversion before access to the T&... but maybe I am just paranoid because I cannot think of a counter-example right now.Aboulia
U
49

The trick behind boost::addressof and the implementation provided by @Luc Danton relies on the magic of the reinterpret_cast; the standard explicitly states at §5.2.10 ¶10 that

An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators. The result is an lvalue that refers to the same object as the source lvalue, but with a different type.

Now, this allows us to convert an arbitrary object reference to a char & (with a cv qualification if the reference is cv-qualified), because any pointer can be converted to a (possibly cv-qualified) char *. Now that we have a char &, the operator overloading on the object is no longer relevant, and we can obtain the address with the builtin & operator.

The boost implementation adds a few steps to work with cv-qualified objects: the first reinterpret_cast is done to const volatile char &, otherwise a plain char & cast wouldn't work for const and/or volatile references (reinterpret_cast cannot remove const). Then the const and volatile is removed with const_cast, the address is taken with &, and a final reinterpet_cast to the "correct" type is done.

The const_cast is needed to remove the const/volatile that could have been added to non-const/volatile references, but it does not "harm" what was a const/volatile reference in first place, because the final reinterpret_cast will re-add the cv-qualification if it was there in first place (reinterpret_cast cannot remove the const but can add it).

As for the rest of the code in addressof.hpp, it seems that most of it is for workarounds. The static inline T * f( T * v, int ) seems to be needed only for the Borland compiler, but its presence introduces the need for addr_impl_ref, otherwise pointer types would be caught by this second overload.

Edit: the various overloads have a different function, see @Matthieu M. excellent answer.

Well, I'm no longer sure of this either; I should further investigate that code, but now I'm cooking dinner :) , I'll have a look at it later.

Unarmed answered 27/6, 2011 at 15:7 Comment(9)
Matthieu M. explanation regarding passing pointer to addressof is incorrect. Don't spoil your great answer with such edits :)Schiffman
"good appetit", further investigation shows that the overload is called for reference to functions void func(); boost::addressof(func);. However removing the overload does not prevent gcc 4.3.4 from compiling the code and producing the same output, so I still don't understand why it is necessary to have this overload.Aboulia
@Matthieu: It looks to be a bug in gcc. According to 13.3.3.2 both T & and T * are indistinguishable for functions. The 1st one is an Identity conversion and the 2nd one is Function-to-Pointer conversion both having "Exact Match" rank (13.3.3.1.1 table 9). So it's necessary to have additional argument.Schiffman
@Matthieu: Just tried it with gcc 4.3.4 (ideone.com/2f34P) and got ambiguity as expected. Did you tried overloaded member functions like in addressof implementation or free function templates? The latter one (like ideone.com/vjCRs) will result in 'T *' overload to be chosen due to temlate argument deduction rules (14.8.2.1/2).Schiffman
@Konstantin: I add not thought of using template functions, I did thought that both function pointer and function reference were indistinguishable but did not dig up the Standard, thanks for the reference.Aboulia
@KonstantinOznobihin "The 1st one is an Identity conversion and the 2nd one is Function-to-Pointer conversion" so the compiler should prefer the first one.Sutra
@curiousguy: Why do you think it should? I've referenced specific C++ standard parts prescribing what should compiler do and all compilers I have access to (including but not limited to gcc 4.3.4, comeau-online, VC6.0-VC2010) report ambiguity just as I've described. Could you please elaborate your reasoning regarding this case?Schiffman
@KonstantinOznobihin "Why do you think it should?" Because I did not checked the issue completely. My bad. "I've referenced specific C++ standard parts prescribing what should compiler" Actually, you only mentioned part of the story. There is more than the "rank", and the table 9 you mentioned has more than one column.Sutra
@curiousguy: Well, I think, comments are just not suitable enough for fully elaborated discussion of such stuff. If you like you could ask corresponding question here and I'm sure you'll get elaborated and detailed answers. Still, the most relevant part of standard is 13.3.3.2 describing ordering relation for standard conversion sequences. Everything else should be easily found using references provided in this part.Schiffman
A
12

I've seen an implementation of addressof do this:

char* start = &reinterpret_cast<char&>(clyde);
ghost* pointer_to_clyde = reinterpret_cast<ghost*>(start);

Don't ask me how conforming this is!

Anacoluthia answered 27/6, 2011 at 14:58 Comment(6)
Legal. char* is the listed exception to type aliasing rules.Chappie
@DeadMG I'm not saying this is not conforming. I'm saying that you should not ask me :)Anacoluthia
@DeadMG There is no aliasing problem here. The question is: is reinterpret_cast<char*> well defined.Sutra
@Sutra and the answer is yes, it's always allowed to cast any pointer type to [unsigned] char * and thereby read the object representation of the pointed-at object. This is another area where char has special privileges.Sanative
@Sanative Just because a cast is "always allowed" doesn't mean you can do anything with the result of the cast.Sutra
@Sutra I should've been more clear about things that are implicit in my comment, in order to avoid ambiguity or pedantry: What is specifically allowed is to dereference the resulting pointer, which is of course a prerequisite to read the object representation. In most other cases, dereferencing a reinterpret_casted pointer (unless it's since been cast back) is implementation-defined behaviour if we're lucky... or worse.Sanative
S
5

Take a look at boost::addressof and its implementation.

Schiffman answered 27/6, 2011 at 14:42 Comment(7)
The Boost code, while interesting, does not explain how its technique works (nor does it explain why two overloads are needed).Antoine
do you mean 'static inline T * f( T * v, int )' overload? Looks like it needed for Borland C workaround only. Approach used there is pretty straightforward. The only subtle (nonstandard) thing there is conversion of 'T&' to 'char&'. Although standard, allows cast from 'T*' to 'char*' there seems to be no such requirements for reference casting. Nevertheless, one might expect it to work exactly the same on most compilers.Schiffman
@Konstantin: the overload is used because for a pointer, addressof returns the pointer itself. It's arguable whether it's what the user wanted or not, but it's how it specified.Aboulia
@Matthieu: are you sure? As far as I can tell, any type (including pointer types) is wrapped inside an addr_impl_ref, so the pointer overload should never be called...Unarmed
@Matthieu: ok, now that you explained it in your answer it makes sense.Unarmed
@Matteo, @Konstantin: I got it wrong (I thought, don't know why, that the two overloads within addressof_impl had different template parameters (auto detected)... Looking further...Aboulia
@KonstantinOznobihin this doesn't really answer the question, as all you say is to where to look for the answer, not what is the answer.Jolandajolanta

© 2022 - 2024 — McMap. All rights reserved.