What is the lifetime of a string literal returned by a function?
Asked Answered
G

5

37

Consider this code:

const char* someFun() {
    // ... some stuff
    return "Some text!!"
}

int main()
{
   { // Block: A
      const char* retStr = someFun();
      // use retStr
   }
}

In the function someFun(), where is "Some text!!" stored (I think it may be in some static area of ROM) and what is its scope lifetime?

Will the memory pointed by retStr be occupied throughout the program or be released once the block A exits?

Grubb answered 5/4, 2010 at 17:32 Comment(1)
you may also take a look on this question: https://mcmap.net/q/419515/-scope-of-string-literalsPirn
D
54

The C++ Standard does not say where string literals should be stored. It does however guarantee that their lifetime is the lifetime of the program. Your code is therefore valid.

Deign answered 5/4, 2010 at 17:37 Comment(3)
Could you reference the (draft) standard?Othaothe
Not official, but this may help - en.cppreference.com/w/cpp/language/…Xavierxaviera
@Othaothe From the n3096 draft, 6.4.5/6 String Literals: "The multibyte character sequence is then used to initialize an array of static storage duration" (emphasis by me).Linden
R
37

The "Some text!!" does not have a scope. Scope is a property of a named entity. More precisely, it is a property of the name itself. "Some text!!" is a nameless object - a string literal. It has no name, and therefore any discussions about its "scope" make no sense whatsoever. It has no scope.

What you seem to be asking about is not scope. It is lifetime or storage duration of "Some text!!". String literals in C/C++ have static storage duration, meaning that they live "forever", i.e. as long as the program runs. So, the memory occupied by "Some text!!" is never released.

Just keep in mind (as a side note) that string literals are non-modifyable objects. It is illegal to write into that memory.

Roadrunner answered 5/4, 2010 at 17:47 Comment(0)
D
5

String will be stored statically in special (usually read-only on modern OS) section of the program binary. Its memory is not allocated (individually for the string, only for total section while loading it to memory) and will not be deallocated.

Decarlo answered 5/4, 2010 at 17:36 Comment(10)
That's no necessarily true. What if the binary format you're linking to doesn't support the notion of "read-only sections"? (e.g. most basic COM files)Kalakalaazar
mamonts doesn't have read only sections too. They have only historic interest.Decarlo
even in com file there will be some part (section of file), or several, for storing constants. They will be not marked as read only in segments or in page descriptors, but the idea will be the same.Decarlo
That was just an extreme example where it's not possible to put the string in a "read only section" (since there are no sections). The point is that this being impossible the standard doesn't impose such a requirement, and therefore a complying compiler/linker might not do it, even when it is possible.Kalakalaazar
Regarding COM files you are absolutely wrong: COM files are real-mode "memory snapshots", and even that memory area in which logically the linker put all the constants isn't read-only in any way. Real-mode doesn't have any memory protection features of that sort.Kalakalaazar
Ok, edited the answer. Even gcc can put string consts to write-enabled memory.Decarlo
@conio, form wiki (sorry): >" COM file can either be very simple, using a single segment, or arbitrarily complex, providing its own memory management system" It will loaded as snapshot, but it can even change the mode to protected.Decarlo
That's irrelevant. Windows was also implemented at a time in a COM file: win.com, but a COM file in itself has no notion of memory protection. AFTER the binary is loaded and running its code can do anything, but the binary format itself has no notion of memory protection. You see, the code inside the file can cause a transition to protected mode, but a COM file in itself doesn't do that. Your argument is analogous to saying that since I can board an airplane with a skateboard it means that skateboards can fly...Kalakalaazar
@conio, elf itself does not have memory protection too (it is a byte stream). Memory protection is applyed by elf loader either in OS or in dynamic loader (and file can be loaded from disk to memory/pagecache before it will be mapped with right protection). C program compiled in COM will be run AFTER crt. And crt MAY add some protection. COM files can contain info for linker, so code from COM will be run only after loading it by (external to this COM) linker.Decarlo
Actually, COM files don't contain information for the linker-loader. They are plain memory snapshots. That's written in the wikipedia article you quoted earlier: It is very simple; it has no header, and contains no metadata, only code and data. EXE files on the other hand contain sections that can be marked as read-only, and the Windows loader (which runs in protected mode and is aware of its capabilities) marks the pages corresponding to the EXE section appropriately.Kalakalaazar
A
0

I posted an answer on a similar question (now deleted), so I will use that example here, too:

The short answer is a string literal message2 will exist in memory as long as the process does, but in the .rodata section (assuming we are talking about an ELF file).

We return a pointer to a string constant, but as we will latter see, there is no separate memory defined anywhere which stores this const char * pointer, and there is no need to, as the address of the string is calculated in the code and returned using the register $rax every time function is called.

But lets take a look in the code at what happens with the GNU Debugger (GDB):

enter image description here

We put a breakpoint in our function returning a pointer to a constant string, and we see assembly code and a process map:

enter image description here

The code gets this string in the following instruction:

0x000055555555514a <+8>:    lea    0xeb3(%rip),%rax        # 0x555555556004

What this instruction does is calculates the address of message2. We see here what position independent code (PIC) means.

The address of the message2 string is not hardcoded as absolute, but is calculated as relative, as hardcoded offset 0xeb3 of the next instruction address (0x555555555151 + 0xeb3) and put in the register rax.

The purpose of relative addressing (current address +/- offset) means a process will always get the right address of message2, no matter where in memory it is loaded.

So here we see that const char * that you asked for actually doesn't exist in memory, because the address is calculated "on the fly" and returned using $rax:

We have the address in $rax:

(gdb) i r $rax
rax   0x555555556004      93824992239620

And it holds the address of message2:

(gdb) x/s 0x555555556004
0x555555556004: "message2"

Now let's see where the address 0x555555556004 is in the process address map:

0x555555556000     0x555555557000     0x1000     0x2000  r--p   /home/drazen/proba/main

So this section is not executable and not writable, just readable and private (r--p) which makes sense as this is not a shared library.

When we check with readelf it shows that it is in the .rodata section of the ELF file:

drazen@HP-ProBook-640G1:~/proba$ readelf  -x .rodata main

Hex dump of section '.rodata':
0x00002000 01000200 6d657373 61676532 00       ....message2.

So the answer is that this string will not be hardcoded in a code segment .text of the ELF file, the read-only data segment .rodata, but yes it will exist as long the process exists in memory.

And just to add one small detail, this constant string will be returned to the main() function by reference, of course (address), but not on the stack; rather in the register rax:

(gdb) i r
rax   0x555555556004      93824992239620
rbx   0x0 
Agatha answered 27/11, 2023 at 12:34 Comment(0)
S
-4

Will the memory pointed by retStr be occupied throughout the program or be released once the block A exits?

It will be not released, but retStr will not be available. (block scope)

const char *ptr;
{   
   const char* retStr = "Scope";
   ptr = retStr;
}   

printf("%s\n", ptr); //prints "Scope"

//printf("%s\n", retStr); //will throw error "retStr undeclared"
Spy answered 5/4, 2010 at 17:57 Comment(3)
it will Not be released only the symbol retStr wouldn't be availableGrubb
Incorrect. The memory that retStr points to after the execution is static memory. It is allocated when the application starts and is only released (effectively) when the application terminates.Eristic
@all: my mistake, i was thinking about retStr. Will change the answer.Spy

© 2022 - 2024 — McMap. All rights reserved.