Why are compound literals in C modifiable
Asked Answered
N

2

18

One does usually associate 'unmodifiable' with the term literal

char* str = "Hello World!";
*str = 'B';  // Bus Error!

However when using compound literals, I quickly discovered they are completely modifiable (and looking at the generated machine code, you see they are pushed on the stack):

char* str = (char[]){"Hello World"};
*str = 'B';  // A-Okay!

I'm compiling with clang-703.0.29. Shouldn't those two examples generate the exact same machine code? Is a compound literal really a literal, if it's modifiable?

EDIT: An even shorter example would be:

"Hello World"[0] = 'B';  // Bus Error!
(char[]){"Hello World"}[0] = 'B';  // Okay!
Neighboring answered 17/4, 2016 at 11:55 Comment(6)
I'm not even sure it's UB, I've never really looked at the official language standard, but it says in 6.5.2.5 in point 12, it says that (char[]){"abc"} is designed to be modifiable.Neighboring
Note that the example above doesn't actually showcase the lvalue-literal behaviour (a clearer example would be something like (int){1} = 2;) - the primary difference between your two snippets is that in the first you have a true string literal, while in the second you initialize a local array with a string literal - you'd have the same behaviour if you just modified the first to read char str[] = ....Charyl
@Leushenko Do I? In the compound literal case, I initialize str with a pointer to it's first character. When I say char str[] = …, I initialize a non-modifiable char literal and copy it's content into the array str on the stack.Neighboring
@hgiesel: You are right, it in fact is an "anonymous object". You can use the const qualifier to explicitly tell the compiler your intened. Note that in C it is the programmer's responsibility not to break this contract. Even for string literals, there is no guarantee a write will not work (and C explicitly allows this as an implementation-extension). So not getting an error does not mean it is defined behaviour. If in doubt, please read the standard.Ide
@Olaf The only question I have know, whether when I say char str[] = (char[]){"Hello"};, I actually initialize it twice on the stack.Neighboring
@hgiesel: No. 1) The C language does not even enforce using a stack (nor a heap, btw.) and there are implementations which don't. 2) There is an object allocated somewhere, plus the string literal to initialise it. But by should there be the same object allocated twice? Your example using a string is not a good one, as you can use the literal directly, but nevertheless a valid one.Ide
P
17

A compound literal is an lvalue and values of its elements are modifiable. In case of

char* str = (char[]){"Hello World"};
*str = 'B';  // A-Okay!  

you are modifying a compound literal which is legal.

C11-§6.5.2.5/4:

If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.9, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. In either case, the result is an lvalue.

As it can be seen that the type of compound literal is a complete array type and is lvalue, therefore it is modifiable unlike string literals

Standard also mention that

§6.5.2.5/7:

String literals, and compound literals with const-qualified types, need not designate distinct objects.101

Further it says:

11 EXAMPLE 4 A read-only compound literal can be specified through constructions like:

(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}   

12 EXAMPLE 5 The following three expressions have different meanings:

"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}

The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these two is modifiable.

13 EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. For example,

(const char []){"abc"} == "abc"

might yield 1 if the literals’ storage is shared.

Pt answered 17/4, 2016 at 11:59 Comment(3)
Note that this question is tagged with C99Conversable
@sunqingyao; Yes. But for this case rule is almost same as C11.Pt
I find it irksome that (so far as I can tell) there's no syntax for const-static compound literals, since string literals are hardly the only kind of static data which will often have a single point of use.Edwin
B
4

The compound literal syntax is a short hand expression equivalent to a local declaration with an initializer followed by a reference to the unnamed object thus declared:

char *str = (char[]){ "Hello World" };

is equivalent to:

char __unnamed__[] = { "Hello world" };
char *str = __unnamed__;

The __unnamed__ has automatic storage and is defined as modifiable, it can be modified via the pointer str initialized to point to it.

In the case of char *str = "Hello World!"; the object pointed to by str is not supposed to be modified. In fact attempting to modify it has undefined behavior.

The C Standard could have defined such string literals as having type const char[] instead of char[], but this would generate many warnings and errors in legacy code.

Yet it is advisable to pass a flag to the compiler to make such string literals implicitly const and make the whole project const correct, ie: defining all pointer arguments that are not used to modify their object as const. For gcc and clang, the command line option is -Wwrite-strings. I also strongly advise to enable many more warnings and make them fatal with -Wall -W -Werror.

Behlau answered 23/11, 2019 at 10:12 Comment(2)
Unfortunately for the code I am writing, no, "compound literal syntax is a short hand expression equivalent to a local declaration" is not true. For instance, you can do <type> foo[<n>]; and get <n> * sizeof(<type>) bytes of uninitialised stack memory in the current scope, but there is no way to have uninitialised compound literals which only allocate, (<type> [<n>]) {} is non standard and most compilers take it to be {0}. If there is a way, let me know. This would have been so useful.Maundy
@user426: this is a different question. BtW your quote from my answer is incomplete: I wrote The compound literal syntax is a short hand expression equivalent to a local declaration with an initializer [...]. If you want to allocate some stack uninitialized space, you can use alloca() on systems where it is available.Behlau

© 2022 - 2024 — McMap. All rights reserved.