I read that string literals are always stored in read only memory and it makes sense as to why.
The storage location of string literals is implementation-defined. If compilers decide to emit a large string literal, it will usually be located in a read-only section of static memory, such as .rodata
.
However, whether this is even necessary is up to the compiler. Compilers are allowed to optimize your code according to the as-if rule, so if the behavior of the program is the same with the literal being stored elsewhere, or nowhere at all, that is also allowed.
Example 1
int sum() {
char arr[] = "ab";
return arr[0] + arr[1];
}
With the following assembly output:
sum():
mov eax, 195
ret
In this case, because everything is a compile-time constant, there is no string literal or array at all. The compiler optimized it away and turned our code into return 195;
by summing up the two ASCII characters a
and b
.
Example 2
void consume(const char*);
void short_string() {
char arr[] = "short str";
consume(arr);
}
short_string():
sub rsp, 24
movabs rax, 8391086215229565043
mov qword ptr [rsp + 8], rax
mov word ptr [rsp + 16], 114
lea rdi, [rsp + 8]
call consume(char const*)@PLT
add rsp, 24
ret
Once again, no code was emitted that would keep the string in read-only memory, but it also wasn't away optimized completely. The compiler sees that the string short str
is very short, so it treats its ASCII bytes as a number 8391086215229565043
and directly mov
s its memory onto the stack. consume()
is called with a pointer to stack memory.
Example 3
void long_string() {
char arr[] = "Lorem ipsum dolor [...] est laborum.";
consume(arr);
}
long_string():
push rbx
sub rsp, 448
lea rsi, [rip + .L__const.long_string().arr]
mov rbx, rsp
mov edx, 446
mov rdi, rbx
call memcpy@PLT
mov rdi, rbx
call consume(char const*)@PLT
add rsp, 448
pop rbx
ret
.L__const.long_string().arr:
.asciz "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
Our string is now much too long to be treated as a number or two. The entire string will now be emitted into static memory, most likely .rodata
after linking. It is still helpful for it to exist, because we can use memcpy
to copy it from static memory onto the stack when initializing arr
.
Conclusion
If you're worried about compilers doing something wasteful here, don't be. Modern compilers are very good at optimizing and deciding which symbols go where, and if they emit a string literal, this is usually because it must exist for some other code to work, or because it makes initialization of an array easier.
See live examples with Compiler Explorer
-fPIC
) would be to store the string in instructions’ “immediate” (payload) values. But that doesn’t really differ too much from storing it in read-only memory… In fact it’s almost no different, some low-level technical stuff (i-cache, d-cache) aside. – Opticschar const[N]
for some positiveN
. The compiler is free to do with it as it pleases. All that C++ says is, "don't write to this object". – ThrawnI read that string literals are always stored in read only memory
there is no such requirement it is compiler author choice. There are scenarios where string literal can be overridden since it initial value is not in used at runtime, like this global mutable variable:char foo[10] = "FooFoo"
; – Stout