Struct vs string literals? Read only vs read-write? [duplicate]
Asked Answered
N

3

8

Does the C99 standard permit writing to compound literals (structs)? It seems it doesn't provide writing to literal strings. I ask about this because it says in C Programming: A Modern Approach, 2nd Edition on Page 406.

Q. Allowing a pointer to a compound literal would seem to make it possible to modify the literal. Is that the case?

A. Yes. Compound literals are lvalues that can be modified.

But, I don't quite get how that works, and how that works with string literals which you certainly can't modify.

char *foo = "foo bar";
struct bar { char *a; int g; };
struct bar *baz = &(struct bar){.a = "foo bar", .g = 5};

int main () {
  // Segfaults
  // (baz->a)[0] = 'X';
  // printf( "%s", baz->a );

  // Segfaults
  // foo[0] = 'a';
  // printf("%s", foo);

  baz->g = 9;
  printf("%d", baz->g);

  return 0;
}

You can see on my list of things that segfault, writing to baz->a causes a segfault. But, writing to baz->g does not. Why is that one of them would cause a segfault and not the other one? How are struct-literals different from string-literals? Why would struct-literals not also be put into read-only section of memory and is the behavior defined or undefined for both of these (standards question)?

Nolannolana answered 23/8, 2018 at 22:41 Comment(2)
Good question, seems nobody actually knows the answer. Same question here: Array as compound literal. Then as now, nobody could actually prove that compound literals can be written to. That is, prove that they are modifiable lvalues and not just lvalues.Polyandry
How is this question primarily opinion based? Voted to reopen.Brash
S
7

First thing first: your struct literal has a pointer member initialized to a string literal. The members of the struct itself are writeable, including the pointer member. It is only the content of the string literal that is not writeable.

String literals were part of the language since the beginning, while struct literals (officially known as compound literals) are a relatively recent addition, as of C99. By that time many implementations existed that placed string literals in read-only memory, especially on embedded systems with tiny amounts of RAM. By then designers of the standard had a choice of requiring string literals to be moved to a writeable location, allowing struct literals to be read-only, or leaving things as-is. None of the three solutions was ideal, so it looks like they went on the path of least resistance, and left everything the way it is.

Does the C99 standard permit writing to compound literals (structs)?

C99 standard does not explicitly prohibit writing to data objects initialized with compound literals. This is different from string literals, whose modification is considered undefined behavior by the standard.

Sladen answered 23/8, 2018 at 23:18 Comment(3)
Cool I'll mark this as accepted if no one has anything else to add because even this does answer my question and I believe is a valuable contribution on the network: "Struct literals came along later and are different." It seems as if some texts are using "literals" to be a function of a syntax, and others of implementation (juxtaposed with a character array). Thanks.Nolannolana
C does not call &(struct bar){.a = "foo bar", .g = 5} a struct literal. Instead it is called a compound literal.Canoness
How does this answer the question "Does the C99 standard permit writing to compound literals (structs)?"Polyandry
T
3

The standard essentially defines the same characteristics to string literals and to compound literals with a const-qualified type used outside the body of a function.

Lifetime

  • String literals: Always static.

    §6.4.5p6 In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

  • Compound literals: Automatic if used inside a function body, otherwise static.

    §6.5.2.5p5 The value of the compound literal is that of an unnamed object initialized by the initializer list. If the compound literal occurs outside the body of a function, the object has static storage duration; otherwise, it has automatic storage duration associated with the enclosing block.

Possibly shared

  • Both string literals and const-qualified compound literals might be shared. You should be prepared for the possibility but cannot rely on it happening.

§6.4.5p7 It is unspecified whether [the arrays created for the string literals] are distinct provided their elements have the appropriate values.

§6.5.2.5p7 String literals, and compound literals with const-qualified types, need not designate distinct objects.

Mutability

  • Modifying either a string literal or a const-qualified compound literal is undefined behaviour. Indeed attempting to modify any const-qualified object is undefined behaviour, although the wording of the standard is probably subject to hair-splitting.

§6.4.5p7 If the program attempts to modify [the array containing a string literal], the behavior is undefined.

§6.7.3p6 If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.

  • A non-const-qualified compound literal can be freely modified. I don't have a quote for this, but the fact that modification is not explicitly prohibited seems to me to be definitive. It's not necessary to explicitly say that mutable objects may be mutated.

The fact that the lifetime of compound literals inside function bodies is automatic can lead to subtle bugs:

/* This is fine */
const char* foo(void) {
  return "abcde";
}

/* This is not OK */
const int* oops(void) {
  return (const int[]){1, 2, 3, 4, 5};
;
Trivium answered 24/8, 2018 at 1:29 Comment(2)
The only thing that matters is if a compound literal is a modifiable lvalue or not. I can find no text stating this in the standard - a compound literal is a lvalue but it is unclear if it is modifiable. So none of this answers if compound literals are read/write.Polyandry
@lundin: I think the standard is clear. If the compound literal has const-qualified type it is not a modifiable lvalue (6.3.2.1) so it cannot be used with a mutation operator, and neither can any of its members. An attempt to get around this by casting away the const would be UB (6.7.3p6). If it's not a const-qualified type, then it's a non-const object with either static or automatic lifetime; the fact that the object was created by a compound literal is not relevant, and the object is mutable. I added the note about lifetimes because I see that bug a lot and GCC doesn't seem to warn.Trivium
B
0

Does the C99 standard permit writing to compound literals (structs)?

By writing to compound literal if you mean modifying elements of a compound literal, then yes, it does if it is not a read only compound literal.

C99-6.5.2.5:

If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.8, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. In either case, the result is an lvalue.

It means, compound literals are lvalues like arrays, and elements of a compound literal can be modified, just like you can modify an aggregate type. For example

// 1
((int []) {1,2,3})[0] = 100;  // OK

// 2
(char[]){"Hello World"}[0] = 'Y';  // OK. This is not a string literal!

// 3
char* str = (char[]){"Hello World"};
*str = 'Y';  // OK. Writing to a compound literal via pointer. 

// 4
(const float []){1e0, 1e1, 1e2}[0] = 1e7 // ERROR. Read only compound literal 

In your code what you are trying to do is modifying a compound literal element which is pointing to a string literal which is non-modifiable. If that element is initialized with a compound literal then it can be modified.

struct bar *baz = &(struct bar){.a = (char[]){"foo bar"}, .g = 5};

This snippet will work now

Segfaults
(baz->a)[0] = 'X';
printf( "%s", baz->a );

Further standard also gives an example, in the same section mentioned above, and differentiate between a string literal, compound literal and read only compound literal:

13 EXAMPLE 5 The following three expressions have different meanings:

"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}

The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these two is modifiable.

Brash answered 24/8, 2018 at 10:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.