C optimisation of string literals
Asked Answered
A

2

23

I've just been inspecting the following in gdb:

char *a[] = {"one","two","three","four"};
char *b[] = {"one","two","three","four"};
char *c[] = {"two","three","four","five"};
char *d[] = {"one","three","four","six"};

...and I get the following:

(gdb) p a
$17 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p b
$18 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p c
$19 = {0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four", 0x80961b7 "five"}
(gdb) p d
$20 = {0x80961a4 "one", 0x80961ac "three", 0x80961b2 "four", 0x80961bc "six"}

I'm really surprised that the string pointers are the same for equivalent words. I would have thought each string would have been allocated its own memory on the stack regardless of whether it was the same as a string in another array.

Is this an example of some sort of compiler optimisation or is it standard behaviour for string declaration of this kind?

Authentic answered 9/7, 2012 at 16:56 Comment(6)
Where did the "stack" even come from in this question? If you declared a, b, c and d as local variables, you have to say so in your question.Swinney
yes - they're local variables of auto duration declared within a function therefore on the stackAuthentic
Yes. It's an example of compiler optimisation.Polysepalous
Related: Where are string constants stored by GCC and from where these pointers are mapped?Slumlord
"I would have thought each string would have been allocated its own memory on the stack" - "on the stack"? With static storage duration? How?Caddoan
How? - thats just me misunderstanding how C stores string literals, above link from @Slumlord very useful in explaining what is actually going onAuthentic
B
28

It's called "string pooling". It's optional in Microsoft Compilers, but not in GCC. If you switch off string pooling in MSVC, then the "same" strings in the different arrays would be duplicated, and have different memory addresses, and so would take up an extra (unnecessary) 50 or so bytes of your static data.

EDIT: gcc prior to v 4.0 had an option, -fwritable-strings which disabled string pooling. The effect of this option was twofold: It allowed string literals to be overwritten, and disabled string pooling. So, in your code, setting this flag would allow the somewhat dangerous code

/* Overwrite the first string in a, so that it reads 'xne'.  Does not */ 
/* affect the instances of the string "one" in b or d */
*a[0] = 'x';
Balbur answered 9/7, 2012 at 17:6 Comment(2)
In GCC (4.7 at least) a switch to disable pooling is -fno-merge-constants.Pfeffer
@Pfeffer note that gcc no longer supports fwritabe-srings, it would be ideal to add both of these notes to your answer.Extortionate
S
8

(I assume that your a, b, c and d are declared as local variables, which is the reason for your stack-related expectations.)

String literals in C have static storage duration. They are never allocated "on the stack". They are always allocated in global/static memory and live "forever", i.e. as long as the program runs.

Your a, b, c and d arrays were allocated on the stack. The pointers stored in these arrays point to static memory. Under these circumstances, there's nothing unusual about pointers for identical words being identical.

Whether a compiler will merge identical literals into one depends on the compiler. Some compilers even have an option that controls this behavior. String literals are always read-only (which is why it is a better idea to use const char * type for your arrays), so it doesn't make much difference whether they are merged or not, until you begin to rely on actual pointer values.

P.S. Just out of curiosity: even if these string literals were allocated on the stack, why would you expect identical literals to be "instantiated" more than once?

Swinney answered 9/7, 2012 at 16:58 Comment(2)
great stuff - thats helped my understanding a lot, hadn't fully understood the string literal stuff and its associated storage duration - i was incorrectly thinking of the strings as just being local variables (automatic) on the stackAuthentic
Nothing that I'm aware of says that two (or more) references to the same string literal must resolve to the same memory location. The compiler could (and some do) allocate storage for every string literal, even if some are "duplicates". See "string pooling" mentioned by @Josh.Creeper

© 2022 - 2024 — McMap. All rights reserved.