Why is returning a reference to a function local value not a compile error?
Asked Answered
K

6

39

The following code invokes undefined behaviour.

int& foo()
{
  int bar = 1234;
  return bar;
}

g++ issues a warning:

warning: reference to local variable ‘bar’ returned [-Wreturn-local-addr]

clang++ too:

warning: reference to stack memory associated with local variable 'bar' returned [-Wreturn-stack-address]

Why is this not a compile error (ignoring -Werror)?

Is there a case where returning a ref to a local var is valid?

EDIT As pointed out, the spec mandates this be compilable. So, why does the spec not prohibit such code?

Keciakeck answered 10/10, 2014 at 9:54 Comment(7)
It may be hard to figure out if the code in the function has many paths.Osteen
There are so many existing programs with this bug, that run fine anyway, that making it an error would be a breaking change :)Paperweight
AFAIK, there is no UB if you don't dereference the pointer.Straiten
@Basile: or use the returned value in any way in this case (since it's a reference rather than a pointer).Endlong
@BasileStarynkevitch What pointer?Aube
Why are you not compiling with -Werror?Tutuila
@LokiAstari, where did I say I wasn't?Keciakeck
A
40

I would say that requiring this to make the program ill-formed (that is, make this a compilation error) would complicate the standard considerably for little benefit. You'd have to exactly spell out in the standard when such cases shall be diagnosed, and all compilers would have to implement them.

If you specify too little, it will not be too useful. And compilers probably already check for this to emit warnings, and real programmers compile with -Wall_you_can_give_me -Werror anyway.

If you specify too much, it will be difficult (or impossible) for compilers to implement the standard.

Consider this class (for which you only have the header and a library):

class Foo
{
  int x;

public:
  int& getInteger();
};

And this code:

int& bar()
{
  Foo f;
  return f.getInteger();
}

Now, should the standard be written to make this ill-formed or not? Probably not, what if Foo is implemented like this:

#include "Foo.h"

int global;

int& Foo::getInteger()
{
  return global;
}

At the same time, it could be implemented like this:

#include "Foo.h"

int& Foo::getInteger()
{
  return x;
}

Which of course would give you a dangling reference.

My point is that the compiler cannot really know whether returning a reference is OK or not, except for a few trivial cases (returning a reference to a function-scope automatic variable or parameter of non-reference type). I don't think it's worth it to complicate the standard for that. Especially as most compilers already warn about this as a quality-of-implementation matter.

Aube answered 10/10, 2014 at 10:12 Comment(7)
These examples really help explain the complexity involved. Thanks. It seems that some languages (D, Rust) have explicit lifetime specifiers that address this problem directly.Keciakeck
I suspect that determining whether the reference is to a local might actually be the halting problem. I'm not 100% sure though.Gamekeeper
@Mysticial: Determining whether the reference will actually be to a local would be the halting problem. On the other hand, I think one could define useful reference semantics which classified reference parameters as ephemeral, returnable, or persistable, would forbid passing local variables as arguments to persistable parameters, and would apply to the functions' return value the strongest restriction applicable to arguments passed to returnable parameters. Such a rule would disallow some constructs that in practice could never create dangling references, but would allow must useful scenarios.Coaler
@Coaler any proof that this is a halting problem? and some compilers(clang/gcc) emit warnings when returning a reference, are there some examples for which these warnings are false positives?Valdes
@HongxuChen: Write a function which accepts a reference and performs two computations in parallel. If the first computation finishes first, the function returns the passed-in reference. If the second computation finishes first, it returns a reference to a static object. If the first computation is never the first to finish, there would be no undefined behavior. Even if the case where neither computation ever finished would be Undefined Behavior and the compiler could assume that at least one computation would eventually finish, determining which one would win would be the Halting Problem.Coaler
@HongxuChen: If the aforementioned method is called by a method which passes it a reference to a local variable and then gives the return value to its caller, the outer method would only return a reference to its local variable if the inner method returned a reference to the passed-in parameter; determining whether that would occur would be equivalent to the Halting Problem.Coaler
@Coaler thanks, got it!. In that case, compilers can at least give a warning.Valdes
E
7

For the same reason C allows you to return a pointer to a memory block that's been freed.

It's valid according to the language specification. It's a horribly bad idea (and is nowhere close to being guaranteed to work) but it's still valid inasmuch as it's not forbidden.

If you're asking why the standard allows this, it's probably because, when references were introduced, that's the way they worked. Each iteration of the standard has certain guidelines to follow (such as minimising the possibility of "breaking changes", those that render existing well-formed programs invalid) and the standard is an agreement between user and implementer, with undoubtedly more implementers than users sitting on the committees :-)

It may be worth pushing that idea through as a potential change and seeing what ISO say but I suspect it would be considered one of those "breaking changes" and therefore very suspect.

Endlong answered 10/10, 2014 at 9:58 Comment(6)
So there's no technical reason? Pointers and references are different beasts. Pointers are much more flexible and allow lots of invalid usages. If you're dealing with pointers, then that's the game you play. However with references there are certain guarantees. I would expect the compiler to reject such code. Mostly I was hoping to learn some edge case I hadn't seen before that meant this may be useful in some circumstance.Keciakeck
@Drew, the compiler cannot reject the code because the standard says it's okay. gcc rightly warns you that what you're doing is iffy but, if it were to reject the code, it would not comply with the standard.Endlong
@Endlong I interpret the OP's question as "why doesn't the standard mandate this ill-formed?"Aube
@Angew, yes that's a better way of phrasing it. Thanks.Keciakeck
It's not valid according to the language specification, to return a pointer to a memory black chat has been freed. (Perhaps you meant "it's not ill-formed" ?Flout
Matt, it's valid in that you are allowed to do it, the same way a++ + ++a is valid. By valid I meant it's not illegal. The fact that it's UB is another issue. I'll clarify.Endlong
S
7

Also, because you may want to get the current stack pointer (whatever that means on your particular implementation).

This function:

 void* get_stack_pointer (void) { int x; return &x; };

AFAIK, it is not undefined behavior if you don't dereference the resulting pointer.

is much more portable than this one:

 void* get_stack_pointer (void) { 
    register void* sp asm ("%esp"); return sp; }

As to why you may want to get the stack pointer: well, there are cases where you have a valid reason to get it: for instance the conservative Boehm garbage collector needs to scan the stack (so wants the stack pointer and the stack bottom).

And if you returned a C++ reference on which you would only take its address using the & unary operator, getting such an address is IIUC legal (it is IMHO the only licit operation you can do on it).

Another reason to get the stack pointer would be to get a non-NULL pointer address (which you could e.g. hash) different of any heap, local or static data. However, you could use (void*)1 or (void*)-1 for that purpose.

So the compiler is right in only warning against this.

I guess that a C++ compiler should accept

int& get_sp_ref(void) { int x; return x; }

void show_sp(void) { 
   std::cout << (&(get_sp_ref())) << std::endl; }
Straiten answered 10/10, 2014 at 10:39 Comment(7)
That may be true for pointers, but I'm asking about references.Keciakeck
References are largely just syntactic sugar for pointers. The equivalent to "dereferencing the pointer" for a reference is "lvalue-to-ralue conversion", and it's doing that which is UBBursar
Removed the useless static_cast. Thanks.Straiten
The C++ example is still wrong, should be return x;, and you don't need all those parenthesesBursar
"AFAIK, it is not undefined behavior if you don't dereference the resulting pointer." -- Technically, what you have is fine and can be safely called, I think, but it's definitely undefined behaviour in C if you even merely store the result in a different variable, and I think C++ followed C in that. Your last example would definitely not be valid by those rules.Stairway
Actually, there's nothing in the C or C++ standards that even requires a stack so this code is also non-portable :-)Endlong
If you don't do anything with the resulting pointer than this function does not achieve the goal "you might want to get the stack pointer"Flout
A
3

To expand on the earlier answers, the ISO C++ standard does not capture the distinction between warnings and errors to begin with; it simply uses the term 'diagnostic' when referring to what a compiler must emit upon seeing an ill-formed program. Quoting N3337, 1.4, paragraphs 1 and 2:

The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior.”

Although this International Standard states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs. Such requirements have the following meaning:

  • If a program contains no violations of the rules in this International Standard, a conforming implementation shall, within its resource limits, accept and correctly execute that program.

  • If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this Standard as “conditionally-supported” when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.

  • If a program contains a violation of a rule for which no diagnostic is required, this International Standard places no requirement on implementations with respect to that program.

Actinomycosis answered 10/10, 2014 at 21:43 Comment(0)
F
1

Something not mentioned by other answers yet is that this code is OK if the function is never called.

The compiler isn't required to diagnose whether a function might ever be called or not. For example you might set up a program which looks for counterexamples to Fermat's Last Theorem, and calls this function if it finds one. It would be a mistake for the compiler to reject such a program.

Flout answered 11/10, 2014 at 10:0 Comment(2)
By that reasoning, should this function compile so long as I never call it? void neverCalled() { !"£$%^&*()_+; }Keciakeck
@DrewNoakes That's a syntax error. The compiler has to diagnose any code which is ill-formed (according to the standard, of course)Flout
B
0

Returning reference into local variable is bad idea, however some people may create code which requires that, so compiler should only warn about that and don't determine valid (valid structure) code as erroneous.

Angew already posted sample with local variable that is actually global. However there is some other (IMHO better) sample.

Object& GetSmth()
{
    Object* obj = new Object();
    return *obj;
}

In this case reference to local object is valid and caller after usage should dealocate memory.


IMPORTANT NOTE I don't encourage and don't recommend to use such coding style, because it is bad, usually it is hard to understand what is going on and it leads in some kind of problems like memory leaks or crashes. It is just a sample which shows why this particular situation cannot be treated as error.

Benedicite answered 13/10, 2014 at 12:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.