Can a local variable's memory be accessed outside its scope?
Asked Answered
G

21

1163

I have the following code.

#include <iostream>

int * foo()
{
    int a = 5;
    return &a;
}

int main()
{
    int* p = foo();
    std::cout << *p;
    *p = 8;
    std::cout << *p;
}

And the code is just running with no runtime exceptions!

The output was 58

How can it be? Isn't the memory of a local variable inaccessible outside its function?

Giacometti answered 22/6, 2011 at 14:6 Comment(18)
this won't even compile as is; if you fix the nonforming business, gcc will still warn address of local variable ‘a’ returned; valgrind shows Invalid write of size 4 [...] Address 0xbefd7114 is just below the stack ptrPibroch
In some platforms/compilers (especially old compilers for DOS) you can even write through NULL pointer and everything seems OK until you overwrite something important (like the code being executed). :)Stringfellow
@Serge that's because most OS-es these days have a write-protected zero-page however not all of them do!Weltpolitik
@Serge: Back in my youth I once worked on some kinda tricky zero-ring code that ran on the Netware operating system that involved cleverly moving around the stack pointer in a way not exactly sanctioned by the operating system. I'd know when I'd made a mistake because often the stack would end up overlapping the screen memory and I could just watch the bytes get written right onto the display. You can't get away with that sort of thing these days.Grader
Ah man this makes me miss my C++ / DCOM / VB days. We had a home-grown red-black tree that had invalid pointer access issues. I had the distinct pleasure of debugging it.Shoestring
@Jasper Bekkers: "that's because most OS-es these days have a write-protected zero-page however not all of them do!" Yea. I know.Stringfellow
@Xeo - I think you misunderstood me... I know it is unsafe, thats for sure! I thought it would be impossible. I guess i should get used to the freedom that C++ gives the developer..Registered
lol. I needed to read the question and some answers before I even understood where the problem is. Is that actually a question about variable's access scope? You don't even use 'a' outside your function. And that is all there is to it. Throwing around some memory references is a totally different topic from variable scope.Lentissimo
@Tomalak please provide a dupe link and I'm happy to vote for close. We can ask a moderator to merge with the question that this one is a dupe of.Gerbold
Dupe answer doesn't mean dupe question. A lot of the dupe questions that people proposed here are completely different questions that happen to refer to the same underlying symptom... but the questioner has know way of knowing that so they should remain open. I closed an older dupe and merged it into this question which should stay open because it has a very good answer.Emissary
@Joel: If the answer here is good, it should be merged into older questions, of which this is a dupe, not the other way around. And this question is indeed a dupe of the other questions proposed here and then some (even though some of the proposed are a better fit than others). Note that I think Eric's answer is good. (In fact, I flagged this question for merging the answers into one of the older questions in order to salvage the older questions.)Chicken
@Joel dupe means (quote) "This question covers exactly the same ground as earlier questions on this topic;", not "This question covers exactly the same ground as a newer question on this topic;". Either your merge or the "close" popup has it backwards.Gerbold
But this way people don't manually have to click the forward link... so it may have been a good idea. But still the merge was backwards. Trying to justify by saying it was the right way around won't work.Gerbold
Weird question w/ so much love, and I used to think that C developers MUST understand the how hardware works, the stack allocation has been the same forever.Marketa
@Maxpm, zero page on 8086 (and 0000:0000 too) has its usages - interrupt vectors, etc, so addressing it was quite normal. Back in the day viruses (and anti-viruses) used to overwrite quite a bit of.Marketa
So memory is overwritten. Otherwise you would get '55'Framboise
i mean it is not overwritten after i exit a function foo. And i can output it even if the local variable were destroyed.Lophobranch
@Lophobranch Undefined behaviour is undefined. You shouldn't use it, and it's not productive to reason about it. Of course the compiler doesn't waste cycles zeroing out memory that belonged to something that is out of scope. You still can't write code that uses something outside of its scope, as defined by the language. If you don't, your code is invalid, whether or not it happens to produce the 'expected' result.Nadiya
G
5008

How can it be? Isn't the memory of a local variable inaccessible outside its function?

You rent a hotel room. You put a book in the top drawer of the bedside table and go to sleep. You check out the next morning, but "forget" to give back your key. You steal the key!

A week later, you return to the hotel, do not check in, sneak into your old room with your stolen key, and look in the drawer. Your book is still there. Astonishing!

How can that be? Aren't the contents of a hotel room drawer inaccessible if you haven't rented the room?

Well, obviously that scenario can happen in the real world no problem. There is no mysterious force that causes your book to disappear when you are no longer authorized to be in the room. Nor is there a mysterious force that prevents you from entering a room with a stolen key.

The hotel management is not required to remove your book. You didn't make a contract with them that said that if you leave stuff behind, they'll shred it for you. If you illegally re-enter your room with a stolen key to get it back, the hotel security staff is not required to catch you sneaking in. You didn't make a contract with them that said "if I try to sneak back into my room later, you are required to stop me." Rather, you signed a contract with them that said "I promise not to sneak back into my room later", a contract which you broke.

In this situation anything can happen. The book can be there—you got lucky. Someone else's book can be there and yours could be in the hotel's furnace. Someone could be there right when you come in, tearing your book to pieces. The hotel could have removed the table and book entirely and replaced it with a wardrobe. The entire hotel could be just about to be torn down and replaced with a football stadium, and you are going to die in an explosion while you are sneaking around.

You don't know what is going to happen; when you checked out of the hotel and stole a key to illegally use later, you gave up the right to live in a predictable, safe world because you chose to break the rules of the system.

C++ is not a safe language. It will cheerfully allow you to break the rules of the system. If you try to do something illegal and foolish like going back into a room you're not authorized to be in and rummaging through a desk that might not even be there anymore, C++ is not going to stop you. Safer languages than C++ solve this problem by restricting your power—by having much stricter control over keys, for example.


Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.

The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage—that is, the period of time when it is validly associated with some program variable—cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.

The second method is to have a “short-lived” storage area where the lifetime of each byte is well known. Here, the lifetimes follow a “nesting” pattern. The longest-lived of these short-lived variables will be allocated before any other short-lived variables, and will be freed last. Shorter-lived variables will be allocated after the longest-lived ones, and will be freed before them. The lifetime of these shorter-lived variables is “nested” within the lifetime of longer-lived ones.

Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.

For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.

It's like the hotel decides to only rent out rooms sequentially, and you can't check out until everyone with a room number higher than you has checked out.

So let's think about the stack. In many operating systems you get one stack per thread and the stack is allocated to be a certain fixed size. When you call a method, stuff is pushed onto the stack. If you then pass a pointer to the stack back out of your method, as the original poster does here, that's just a pointer to the middle of some entirely valid million-byte memory block. In our analogy, you check out of the hotel; when you do, you just checked out of the highest-numbered occupied room. If no one else checks in after you, and you go back to your room illegally, all your stuff is guaranteed to still be there in this particular hotel.

We use stacks for temporary stores because they are really cheap and easy. An implementation of C++ is not required to use a stack for storage of locals; it could use the heap. It doesn't, because that would make the program slower.

An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can come back for it later illegally; it is perfectly legal for the compiler to generate code that turns back to zero everything in the "room" that you just vacated. It doesn't because again, that would be expensive.

An implementation of C++ is not required to ensure that when the stack logically shrinks, the addresses that used to be valid are still mapped into memory. The implementation is allowed to tell the operating system "we're done using this page of stack now. Until I say otherwise, issue an exception that destroys the process if anyone touches the previously-valid stack page". Again, implementations do not actually do that because it is slow and unnecessary.

Instead, implementations let you make mistakes and get away with it. Most of the time. Until one day something truly awful goes wrong and the process explodes.

This is problematic. There are a lot of rules and it is very easy to break them accidentally. I certainly have many times. And worse, the problem often only surfaces when memory is detected to be corrupt billions of nanoseconds after the corruption happened, when it is very hard to figure out who messed it up.

More memory-safe languages solve this problem by restricting your power. In "normal" C# there simply is no way to take the address of a local and return it or store it for later. You can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends. In order to take the address of a local and pass it back, you have to put the compiler in a special "unsafe" mode, and put the word "unsafe" in your program, to call attention to the fact that you are probably doing something dangerous that could be breaking the rules.

For further reading:

  • What if C# did allow returning references? Coincidentally that is the subject of today's blog post:

    Ref returns and ref locals

  • Why do we use stacks to manage memory? Are value types in C# always stored on the stack? How does virtual memory work? And many more topics in how the C# memory manager works. Many of these articles are also germane to C++ programmers:

    Memory management

Grader answered 22/6, 2011 at 20:1 Comment(53)
If the hotel were about to be replaced by a football stadium, wouldn't you notice the lack of people? Or the monstrous army of giant bulldozers outside?Ise
@muntoo: Unfortunately it's not like the operating system sounds a warning siren before it decommits or deallocates a page of virtual memory. If you're mucking around with that memory when you don't own it anymore the operating system is perfectly within its rights to take down the entire process when you touch a deallocated page. Boom!Grader
I like the analogy, but nearly all hotels use programmable key cards that get locked out at a specified time, or when a new key is issued for that room, whichever comes first. And I would imagine the very few hotels that do not use such a system would be very insistent that you return your key at checkout.Supersede
That's a great analogy, but bashing C++ at the end is not OK. C++ doesn't impose too many restrictions, but that lack of restrictions normally pays back in measurable performance gains.Texture
@Kyle: Only safe hotels do that. The unsafe hotels get measurable profit gains from not having to waste time on programming keys.Gati
@Texture I don't think he is bashing C++ at the end. C++ is not safe, and as you say, this is a good thing in a lot of situations. Likewise safer languages are less powerful, but can be easier to use. They're just different.Grozny
@cyberguijarro: That C++ is not memory safe is simply a fact. It's not "bashing" anything. Had I said, for example, "C++ is a horrid mishmash of under-specified, overly-complex features piled on top of a brittle, dangerous memory model and I am thankful every day I no longer work in it for my own sanity", that would be bashing C++. Pointing out that it's not memory safe is explaining why the original poster is seeing this issue; it's answering the question, not editorializing.Grader
@Eric: C# (really, .NET) isn't "safe" either in that respect. I can combine Math.Random, IntPtr, and Marshal.Copy and cause total chaos (no unsafe keyword nor /unsafe compiler switch needed). Safety comes from adhering to the contract, not from language design (although a language can and should make coding in a style that adheres to the contract as easy as possible, and provide warning when the contract is violated as much as possible.)Mcphee
Nice explanation Eric. Quick question! Which language would you say is a safer language?!Advisedly
@Bitmap: LOGO is quite safe.Mcphee
@Ben well duh, obviously there are ways to become unsafe, which include library functions marked as such (so permissions kick in if required). If someone did a LOGO implementation with a library function allowing intptr moral equivalents then it would stop being safe by your metric too.Dinsdale
Strictly speaking the analogy should mention that the receptionist at the hotel was quite happy for you to take the key with you. "Oh, do you mind if I take this key with me?" "Go ahead. Why would I care? I only work here". It doesn't become illegal until you try to use it.Temblor
@PhilNash: It breaks down a little there, as the key is usually the property of the hotel.Evitaevitable
@Ben: @Dinsdale is right; that there are library functions that do horrible things if you misuse them is a property of those library functions, not the C# language. C# the language is both memory safe and type safe provided that you don't have "unsafe" code blocks in there. If you do, then it is every bit as memory-unsafe as C++. The point is to isolate areas of memory unsafety to areas that can be easily identified and thoroughly reviewed.Grader
@Kyle Cronin your point only furthers the analogy. Back when C++ was invented, programmable card keys for hotels were less common or even nonexistent. Newer hotels have naturally adopted safer practices, as have newer languages. Even older hotels have been retrofitted with new locks, as has C++ (smart pointers anybody?)Dildo
C++ not being memory safe makes it pragmatic. Some tricks can be used, and those hacks would not be there if C++ was too safe.Cyndycynera
@Thaddee: First off, there are plenty of pragmatic languages that are memory safe. However, the problem with C++ is not that it is unsafe. The problem it is that it is so easy to accidentally do something massively unsafe and not realize that you're doing so until a you crash the end-user's machine. Memory-unsafe languages often are quite useful, I agree, but there should be a way of isolating that unsafeness to specifically those "tricky, hacky" bits of code that really need it.Grader
Eric: This question might be getting traffic because it was the top post on Hacker News: news.ycombinator.com/item?id=2686580. Regardless, 1100 upvotes in 24 hours?! That must be a record, by far.Sufferable
Please, please at least consider writing a book one day. I would buy it even if it was just a collection of revised and expanded blog posts, and I'm sure so would a lot of people. But a book with your original thoughts on various programming-related matters would be a great read. I know that it's incredible hard to find the time for it, but please consider writing one.Assorted
@Dyppl: Thanks for the kind words. Having written a couple of books already I am well aware of how much work it is! I have considered turning the blog into a book and I might at some point if I can find both the time and a willing publisher.Grader
Actually there are three basic techniques for managing memory in C and C++. There are the two that you mentioned plus static memory where the variables have process lifetime. And if you don't mind getting very technical there is also register file storage, but this is apparently ignored by current compilers.Metrical
So what exactly is the frequency of finding that book every time I sneak into the same room? Also, on what factors does this frequency depend on?Keli
@Keli answer your question with science. Get a few hundred c compilers and try a few hundred different configurations of each and soon you will have excellent empirical data. Anything else is guessing.Grader
I have written a lot of code in a lot of different languages. My least favorite language has been C++. I have no issues with unsafe languages - what I have issues with is a language so poorly designed (and then hacked upon to cover up those design flaws) that it is FAR TOO EASY to create code that segfaults. I'm dealing with an issue right now that was SUPPOSED to fix memory leaks. Now it segfaults. Fun. -_-Bricklayer
Does the same also apply to local variables inside the same function, that are declared in a different scope? I'm asking since in my experience both GCC and MSVC do a) not warn (even with -Wextra) about using a pointer to variable in a different scope, and b) create assembly that suggests they track the usage of every variable through pointers even beyond the variable's scope. example: void foo(void){ int i, *px; for (i=0;i<10;i+=*px) {int x=i+1; px=&x;} printf("*px=%i\n", *px); }Polynomial
@timo you are required to never use an address to a local whose lifetime has ended. If you do and it happens to work, well, again, the runtime is not required to fail when you break the rules. Unsafe code is marked unsafe for a reason.Grader
@EricLippert I think this is best analogy that I've ever seen related this topic but I have one confusion you wrote that "yours could be in the hotel's furnace" it means my value could be there in other location in the system or something else you try to explain ?Sudatorium
@VikasVerma: Some memory managers deliberately shred memory when it is no longer usable. The debug version of the Microsoft C runtime, for example, sets unused memory to 0xCC because (1) it is very easy to see in the debugger memory window that a particular block of memory is now no longer valid, and (2) that is the "break into the debugger" instruction code; if the shredded memory ever gets executed then the debugger will be activated.Grader
I actually disagree with this as 'answer' after being sent the link to it, in that it didn't answer the clear cause of confusion for the poster. He clearly thinks that, similar to an object, a function contains its own local storage and therefore doesn't exist after the function is "destroyed". Only it's not actually destroyed. Unlike classes, they aren't a container for the variables used in them, they're items stored on the stack or in a register. However, this answer is a great analogy on how manipulated access of the stack can (and can't) work. Technical answer is more important, though.Verdi
@Deji: Your psychic powers are much stronger than mine; I have no idea what the original poster was thinking.Grader
@ErricLippert It's not psychic powers, but more familiarity with the confusion, the example he's using and the actual question he asked. He asked if the memory was inaccessible, which means he likely thinks the memory doesn't exist any more. Both are untrue, the memory is accessible and it does exist. The reason to which is the difference in the way these are stored, which is why I think the 'answer' ought to focus on that on a technical level.Verdi
@VikasVerma It means that your drawer is being reconstructed.Splenetic
@EricLippert: Thanks for the great answer. What is your opinion about C++11 & C++14 way of free store management using smart pointers? Can I now say that modern C++ is safe language because there is no need to use delete operatorTeaching
@meet: I am not an expert by any means on what has been added to C++ 11 and 14, though in talking with people who are experts, it sounds to me like there is a lot of good stuff in there. More generally I am happy to see that the C++ committee is willing to be both bold and active as they move the language towards something more modern and less error-prone.Grader
@EricLippert: Ok. But question is why C++ won't stop me if I do something foolish? Wouldn't it be very nice if compiler gives me error when I attempt to take address of local variable? Why C++ provides so much freedom to the programmer? Or these are the problems C++ inherited from C? Your help will be appreciated.Teaching
@meet: you should ask these questions of someone who is an expert in the design of C++; I would not care to speculate as to the motives of the C++ language designers. I would note that "stop the user from doing something foolish" does not appear to have been too high on the list of traits considered admirable by the designers of C.Grader
@Meet Remember that C++ is a general propuse language. Total memory control is needed for a wide range of applications (cracking tools).Lekishalela
@EricLippert: C++ doesn't define this behaviour, true. But I think that if you are running on x86, the architecture guarantees that you can safely write and read up to 128 bytes above the stack pointer (esp), without risking that the memory changes. Now, if the compiler doesn't compile any instructions that actively modify that memory (which is probably the case, since it will just increase esp and jump back when leaving the function), I think you could technically say that on x86, this is defined behaviour. Is this true? Any thoughts on that?Tuggle
@MartijnCourteaux: Who says that the compiler is required to use esp to determine the locations of local variables? If a particular compiler vendor defines the behaviour for a particular implementation, then the behaviour is implementation defined.Grader
@EricLippert: I don't see where you are going. Could it be true that on a specific architecture/compiler combo, this results in consistent behaviour, even if the local variables now sit above the stackpointer (&var < esp)? For example: i tried x86_64 with gcc -O0, and it produces code that I think will give consistent results.Tuggle
@MartijnCourteaux: Undefined behavior can do anything. Consistent behavior is a subset of anything, so yes, that is possible. Where I am going is: you are asking whether behaviour that is defined by a particular implementation is a kind of implementation-defined behavior. Yes, it is.Grader
In the last paragraph before your update you say something to the effect of "C++ is not a safe language[...] Safer languages like C++[....]" Did you mean to say C# is safer?Gangling
It would be nice if the answer at least once mentioned word undefined behaviourCineraria
@GiorgiMoniava: Comment noted. Consider writing an answer you like better; that way the whole site is improved.Grader
@EricLippert Your answer is already pretty good, I don't attempt to criticize it, just wrote my opinion. Thanks. It won't be easy/realistic to write something better than current answers here.Cineraria
I have to agree with @Assorted that I would like to read a book written by you. Along with some of the blog posts/answers written by the jOOQ crew, or Josh Bloch/Goetz, your answers provide a really detailed and easily understandable material on behind the scenes/under the hood details regarding programming languagesMelon
"C++ is not a safe language". And chainsaws aren't safe either but, as long as you use them properly, they're so much better than the alternative :-) Unless the alternative is Python, of course.Brosine
@Teaching "But question is why C++ won't stop me if I do something foolish? Wouldn't it be very nice if compiler gives me error when I attempt to take address of local variable? Why C++ provides so much freedom to the programmer?" Let the designer of C++ answer that: stroustrup.com/bs_faq.html#unsafeFellow
I am reading the book C++: The complete Reference 4th. ed. by Herbert Schildt. On p. 345 the author writes: "One thing you must be careful about when returning references is that the object being referred to does not go out of scope after the function terminates". Is this author wrong?Phalange
@morpheus: Herb Schildt books are famously wrong on every page. You'd be better off throwing that book in the trash; you will learn all kinds of wrong things from it. See lysator.liu.se/c/schildt.html. I was unfortunately the editor of one of his C# books and it was godawful.Grader
@paxdiablo, I would rephrase "C++ is not a safe language" as "C++ doesn't hold your hand at every step. It assumes you are mature and sensible enough to know what you are doing.". This is a choice we make, if less holding hands means the program will be more efficient, faster, then that's great. If it allows me to do tricks that other languages don't let me do, then that's great. And if someone wants the language to protect them against their own stupidity, then they can pick a different language. I believe Ada was made specifically for that purpose.Nevis
Just to be clear, @QwertYuiop, that "not a safe language" was something in the answer that I was quoting, not my own opinion. You may have already realised that, but I wanted to be certain. I was stating that languages are only unsafe if you don't know how to use them properly. That's what a good toolsmith does, learns how to use the tools.Brosine
@paxdiablo, I agree, and whilst I can't consider myself an expert, I'm comfortable enough that I consider the "unsafe" bit a feature rather than a problem. One particular "unsafe" technique that I like, is to pass around void pointers and use them to access objects. Yes you lose type checking, but you chose to. Python is worse in that variables are typeless by default: You can pass a string to something that expects a number, and it won't even complain but just goes ahead and performs the work anyway, just not the way you intended, e.g. it happily "adds" 1 and 1 to make 11.Nevis
C
284

You're are simply reading and writing to memory that used to be the address of a. Now that you're outside of foo, it's just a pointer to some random memory area. It just so happens that in your example, that memory area does exist and nothing else is using it at the moment.

You don't break anything by continuing to use it, and nothing else has overwritten it yet. Therefore, the 5 is still there. In a real program, that memory would be reused almost immediately and you'd break something by doing this (though the symptoms may not appear until much later!).

When you return from foo, you tell the OS that you're no longer using that memory and it can be reassigned to something else. If you're lucky and it never does get reassigned, and the OS doesn't catch you using it again, then you'll get away with the lie. Chances are though you'll end up writing over whatever else ends up with that address.

Now if you're wondering why the compiler doesn't complain, it's probably because foo got eliminated by optimization. It usually will warn you about this sort of thing. C assumes you know what you're doing though, and technically you haven't violated scope here (there's no reference to a itself outside of foo), only memory access rules, which only triggers a warning rather than an error.

In short: this won't usually work, but sometimes will by chance.

Canister answered 23/6, 2011 at 5:43 Comment(0)
C
160

Because the storage space wasn't stomped on just yet. Don't count on that behavior.

Comparative answered 19/5, 2010 at 2:33 Comment(3)
Man, that was the longest wait for a comment since, "What is truth? said jesting Pilate." Maybe it was a Gideon's Bible in that hotel drawer. And what happened to them, anyway? Notice they are no longer present, in London at least. I guess that under the Equalities legislation, you would need a library of religious tracts.Burgenland
I could have sworn that I wrote that long ago, but it popped up recently and found my response wasn't there. Now I have to go figure out your allusions above as I expect I'll be amused when I do >.<Comparative
Haha. Francis Bacon, one of Britain's greatest essayists, whom some people suspect wrote Shakespeare's plays, because they can't accept that a grammar school kid from the country, son of a glover, could be a genius. Such is the English class system. Jesus said, 'I am the Truth'. oregonstate.edu/instruct/phl302/texts/bacon/bacon_essays.htmlBurgenland
R
95

A little addition to all the answers:

If you do something like this:

#include <stdio.h>
#include <stdlib.h>

int * foo(){
    int a = 5;
    return &a;
}
void boo(){
    int a = 7;

}
int main(){
    int * p = foo();
    boo();
    printf("%d\n", *p);
}

The output probably will be: 7

That is because after returning from foo() the stack is freed and then reused by boo().

If you disassemble the executable, you will see it clearly.

Rodgerrodgers answered 22/6, 2011 at 14:6 Comment(8)
Simple, but great example to understand the underlying stack theory.Just one test addition, declaring "int a = 5;" in foo() as "static int a = 5;" can be used to understand the scope and life time of a static variable.Tiffanytiffi
-1 "for will probably be 7". The compiler might enregister a in boo. It might remove it because it's unnecessary. There is a good chance that *p will not be 5, but that doesn't mean that there is any particularly good reason why it will probably be 7.Hamnet
It is called undefined behavior!Balkin
why and how boo reuses the foo stack ? aren't function stacks separated from each other, also I get garbage running this code on Visual Studio 2015Ominous
@Ominous it's almost a year old, but no, "function stacks" are not separated from each other. A CONTEXT has a stack. That context uses its stack to enter main, then descends into foo(), exists, then descends into boo(). Foo() and Boo() both enter with the stack pointer at the same location. This isn't however, behavior that should be relied upon. Other 'stuff' (like interrupts, or the OS) can use the stack between the call of boo() and foo(), modifying it's contents...Spinney
with gcc version 8.3.0 (Debian 8.3.0-6) I get Segmentation faultCounts
the stack will get reused , if no function call happen in between then it might be possible we will get same value condition if kernel does not paged out that stack page if the address lies in different page in below example #include <stdlib.h> int * foo(){ int a = 5; return &a; } int main(){ int i = 0; int k = 0; int n = 0; int * p = foo(); while ( i <10000000) { while (k <10000000) { n= n+1; } n = n-1; } printf("%d\n",*p); //it will print value 5 as stack contents are not zeroed out during unwind }Nth
Will probably be 7 in 32 bit and Debug mode, maybe?Construe
W
71

In C++, you can access any address, but it doesn't mean you should. The address you are accessing is no longer valid. It works because nothing else scrambled the memory after foo returned, but it could crash under many circumstances. Try analyzing your program with Valgrind, or even just compiling it optimized, and see...

Whitewing answered 22/6, 2011 at 14:15 Comment(1)
You probably mean you can attempt to access any address. Because most of the operating systems today will not let any program access any address; there are tons of safeguards to protect the address space. This is why there will not be another LOADLIN.EXE out there.Charwoman
E
68

You never throw a C++ exception by accessing invalid memory. You are just giving an example of the general idea of referencing an arbitrary memory location. I could do the same like this:

unsigned int q = 123456;

*(double*)(q) = 1.2;

Here I am simply treating 123456 as the address of a double and write to it. Any number of things could happen:

  1. q might in fact genuinely be a valid address of a double, e.g. double p; q = &p;.
  2. q might point somewhere inside allocated memory and I just overwrite 8 bytes in there.
  3. q points outside allocated memory and the operating system's memory manager sends a segmentation fault signal to my program, causing the runtime to terminate it.
  4. You win the lottery.

The way you set it up it is a bit more reasonable that the returned address points into a valid area of memory, as it will probably just be a little further down the stack, but it is still an invalid location that you cannot access in a deterministic fashion.

Nobody will automatically check the semantic validity of memory addresses like that for you during normal program execution. However, a memory debugger such as Valgrind will happily do this, so you should run your program through it and witness the errors.

Emblazon answered 22/6, 2011 at 14:15 Comment(2)
I'm just going to write a program now that keeps on running this program so that 4) I win the lotteryCrofoot
Constant 0xDEADBEEF is more juicy (and uneven) and is guaranteed some visible action on most systems.Phalanger
S
29

Did you compile your program with the optimiser enabled? The foo() function is quite simple and might have been inlined or replaced in the resulting code.

But I agree with Mark B that the resulting behavior is undefined.

Subedit answered 22/6, 2011 at 14:12 Comment(3)
That's my bet. Optimizer dumped the function call.Banded
That is not necessary. Since no new function is called after foo(), the functions local stack frame is simply not yet overwritten. Add another function invocation after foo(), and the 5 will be changed...Viddah
I ran the program with GCC 4.8, replacing cout with printf (and including stdio). Rightfully warns "warning: address of local variable ‘a’ returned [-Wreturn-local-addr]". Outputs 58 with no optimization and 08 with -O3. Strangely P does have an address, even though its value is 0. I expected NULL (0) as address.Meiny
N
24

Your problem has nothing to do with scope. In the code you show, the function main does not see the names in the function foo, so you can't access a in foo directly with this name outside foo.

The problem you are having is why the program doesn't signal an error when referencing illegal memory. This is because C++ standards does not specify a very clear boundary between illegal memory and legal memory. Referencing something in popped out stack sometimes causes error and sometimes not. It depends. Don't count on this behavior. Assume it will always result in error when you program, but assume it will never signal error when you debug.

Northbound answered 23/6, 2011 at 4:45 Comment(2)
I recall from an old copy of Turbo C Programming for the IBM, which I used to play around with some way back when, how directly manipulating the graphics memory, and the layout of the IBM's text mode video memory, was described in great detail. Of course then, the system that the code ran on clearly defined what writing to those addresses meant, so as long as you didn't worry about portability to other systems, everything was fine. IIRC, pointers to void were a common theme in that book.Jewess
@Rodgerrodgers Kjörling: Sure! People like to do some dirty work once in a while ;)Northbound
P
22

Pay attention to all warnings. Do not only solve errors.

GCC shows this warning:

warning: address of local variable 'a' returned

This is the power of C++. You should care about memory. With the -Werror flag, this warning became an error and now you have to debug it.

Phalanger answered 22/6, 2011 at 14:6 Comment(1)
This is the most practical answer. Think of default compiler flags as 'compatibility mode'. Don't use this mode unless dealing with legacy code. Instead turn on warnings. (-Werror -Wall -Wextra is a good start.) Further, add run-time checking with -fsanitize=address,undefined if you're not sure your program is correct, like this.Pertussis
K
20

It works because the stack has not been altered (yet) since a was put there. Call a few other functions (which are also calling other functions) before accessing a again and you will probably not be so lucky anymore... ;-)

Keep answered 23/6, 2011 at 15:31 Comment(0)
A
18

You are just returning a memory address. It's allowed, but it's probably an error.

Yes, if you try to dereference that memory address you will have undefined behavior.

int * ref () {

    int tmp = 100;
    return &tmp;
}

int main () {

    int * a = ref();
    // Up until this point there is defined results
    // You can even print the address returned
    // but yes probably a bug

    cout << *a << endl;//Undefined results
}
Arguseyed answered 19/5, 2010 at 2:33 Comment(2)
I disagree: There is a problem before the cout. *a points to unallocated (freed) memory. Even if you don't derefence it, it is still dangerous (and likely bogus).Lingulate
@ereOn: I clarified more what I meant by problem, but no it is not dangerous in terms of valid c++ code. But it is dangerous in terms of likely the user made a mistake and will do something bad. Maybe for example you are trying to see how the stack grows, and you only care about the address value and will never dereference it.Arguseyed
M
18

This behavior is undefined, as Alex pointed out. In fact, most compilers will warn against doing this, because it's an easy way to get crashes.

For an example of the kind of spooky behavior you are likely to get, try this sample:

int *a()
{
   int x = 5;
   return &x;
}

void b( int *c )
{
   int y = 29;
   *c = 123;
   cout << "y=" << y << endl;
}

int main()
{
   b( a() );
   return 0;
}

This prints out "y=123", but your results may vary (really!). Your pointer is clobbering other, unrelated local variables.

Martymartyn answered 24/6, 2011 at 22:4 Comment(0)
E
17

That's classic undefined behaviour that's been discussed here not two days ago -- search around the site for a bit. In a nutshell, you were lucky, but anything could have happened and your code is making invalid access to memory.

Emblazon answered 24/6, 2011 at 21:57 Comment(0)
P
16

You actually invoked undefined behaviour.

Returning the address of a temporary works, but as temporaries are destroyed at the end of a function the results of accessing them will be undefined.

So you did not modify a but rather the memory location where a once was. This difference is very similar to the difference between crashing and not crashing.

Preempt answered 24/6, 2011 at 21:57 Comment(0)
H
14

In typical compiler implementations, you can think of the code as "print out the value of the memory block with adress that used to be occupied by a". Also, if you add a new function invocation to a function that constains a local int it's a good chance that the value of a (or the memory address that a used to point to) changes. This happens because the stack will be overwritten with a new frame containing different data.

However, this is undefined behaviour and you should not rely on it to work!

Hip answered 22/6, 2011 at 14:18 Comment(3)
"print out the value of the memory block with address that used to be occupied by a" isn't quite right. This makes it sound like his code has some well-defined meaning, which is not the case. You are right that this is probably how most compilers would implement it, though.Snowshoe
@BrennanVincent: While the storage was occupied by a, the pointer held the address of a. Although the Standard does not require that implementations define the behavior of addresses after the lifetime of their target has ended, it also recognizes that on some platforms UB is processed in a documented manner characteristic of the environment. While the address of a local variable won't generally be of much use after it has gone out of scope, some other kinds of addresses may still be meaningful after the lifetime of their respective targets.Katy
@BrennanVincent: For example, while the Standard may not require that implementations allow a pointer passed to realloc to be compared against the return value, nor allow pointers to addresses within the old block to be adjusted to point to the new one, some implementations do so, and code which exploits such a feature may be more efficient than code which has to avoid any action--even comparisons--involving pointers to the allocation that was given to realloc.Katy
G
14

It can, because a is a variable allocated temporarily for the lifetime of its scope (foo function). After you return from foo the memory is free and can be overwritten.

What you're doing is described as undefined behavior. The result cannot be predicted.

Generator answered 24/6, 2011 at 21:57 Comment(0)
M
13

The things with correct (?) console output can change dramatically if you use ::printf but not cout.

You can play around with debugger within below code (tested on x86, 32-bit, Visual Studio):

char* foo()
{
  char buf[10];
  ::strcpy(buf, "TEST");
  return buf;
}

int main()
{
  char* s = foo();    // Place breakpoint and the check 's' variable here
  ::printf("%s\n", s);
}
Magdalenamagdalene answered 22/6, 2011 at 14:6 Comment(1)
That will not compile (also indicated by the syntax highlighting). There is a non-ASCII double quote: Phalanger
P
7

It's the 'dirty' way of using memory addresses. When you return an address (pointer) you don't know whether it belongs to local scope of a function. It's just an address.

Now that you invoked the 'foo' function, that address (memory location) of 'a' was already allocated there in the (safely, for now at least) addressable memory of your application (process).

After the 'foo' function returned, the address of 'a' can be considered 'dirty', but it's there, not cleaned up, nor disturbed/modified by expressions in other part of program (in this specific case at least).

A C/C++ compiler doesn't stop you from such 'dirty' access (it might warn you though, if you care). You can safely use (update) any memory location that is in the data segment of your program instance (process) unless you protect the address by some means.

Phalanger answered 22/6, 2011 at 14:6 Comment(0)
W
6

After returning from a function, all identifiers are destroyed instead of kept values in a memory location and we can not locate the values without having an identifier. But that location still contains the value stored by previous function.

So, here function foo() is returning the address of a and a is destroyed after returning its address. And you can access the modified value through that returned address.

Let me take a real world example:

Suppose a man hides money at a location and tells you the location. After some time, the man who had told you the money location dies. But still you have the access of that hidden money.

Warrantor answered 22/6, 2011 at 14:6 Comment(0)
V
0

Quick answer: What you did here is called dangling pointer. When you exit the scope of function everything inside is destroyed, so your pointer is technically just pointing nowhere. When you access it it causes undefined behavior.

In this case you got lucky, the program acted in a way that you thought it would. But often times it won't with undefined behavior. So in general avoid doing what you did.

Vinylidene answered 22/6, 2011 at 14:6 Comment(0)
A
0

Your code is very risky. You are creating a local variable (which is considered destroyed after function ends) and you return the address of memory of that variable after it is destroyed.

That means the memory address could be valid or not, and your code will be vulnerable to possible memory address issues (for example, a segmentation fault).

This means that you are doing a very bad thing, because you are passing a memory address to a pointer which is not trustable at all.

Consider this example, instead, and test it:

int * foo()
{
    int *x = new int;
    *x = 5;
    return x;
}

int main()
{
    int* p = foo();
    std::cout << *p << "\n"; // Better to put a newline in the output, IMO
    *p = 8;
    std::cout << *p;
    delete p;
    return 0;
}

Unlike your example, with this example you are:

  • allocating memory for an int into a local function
  • that memory address is still valid also when function expires (it is not deleted by anyone)
  • the memory address is trustable (that memory block is not considered free, so it will be not overridden until it is deleted)
  • the memory address should be deleted when not used. (see the delete at the end of the program)
Aiken answered 22/6, 2011 at 14:6 Comment(7)
Did you add something not already covered by the existing answers? And please don't use raw pointers/new.Evitaevitable
The asker used raw pointers. I did an example wich reflected exactly the example he did in order to allow him to see the difference between untrusty pointer and trusty one. Actually there is another answer similar to mine, but it uses strcpy wich, IMHO, could be less clear to a novice coder than my example that uses new.Aiken
They didn't use new. You're teaching them to use new. But you shouldn't use new.Evitaevitable
So in your opinion it is better to pass an address to a local variable wich is destroyed in a function than actually allocating memory? This makes no sense. Understanding the concept of allocating e deallocating memory is important, imho, mainly if you are asking about pointers (asker didn't use new, but used pointers).Aiken
When did I say that? No, it is better to use smart pointers to properly indicate ownership of the referenced resource. Don't use new in 2019 (unless you're writing library code) and don't teach newcomers to do so either! Cheers.Evitaevitable
smart pointers surely are better then new, I agree with you. But I used new since it is simplier than smart pointers to use and to understand for a novice. I think that complexity should be scaled. The first thing is to understand what allocation means before reaching the next step (wich could be... how could I use allocation in a better way? -> smart pointers, and other tools in std::) - I admit, however, that I am a very old-school c++ auto-learner :D so -> my fault :PAiken
Object management with smart pointers is what should be taught. new and delete is an advanced topic that can be taught later. :)Evitaevitable

© 2022 - 2024 — McMap. All rights reserved.