Flexible array members can lead to undefined behavior?
Asked Answered
M

3

35
  1. By using flexible array members (FAMs) within structure types, are we exposing our programs to the possibility of undefined behavior?

  2. Is it possible for a program to use FAMs and still be a strictly conforming program?

  3. Is the offset of the flexible array member required to be at the end of the struct?

The questions apply to both C99 (TC3) and C11 (TC1).

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

int main(void) {
    struct s {
        size_t len;
        char pad;
        int array[];
    };

    struct s *s = malloc(sizeof *s + sizeof *s->array);

    printf("sizeof *s: %zu\n", sizeof *s);
    printf("offsetof(struct s, array): %zu\n", offsetof(struct s, array));

    s->array[0] = 0;
    s->len = 1;

    printf("%d\n", s->array[0]);

    free(s);
    return 0;
}

Output:

sizeof *s: 16
offsetof(struct s, array): 12
0
Mile answered 25/6, 2017 at 11:1 Comment(6)
Wasn't the C11 TC1 the document that changed a pair of yyyymm values into 201112? So C11 TC1 has no effect whatsoever on FAM support, etc.Ferromagnetism
@JonathanLeffler For avoiding confusion, I added confusion. By specifying C99 TC0 I was referring to C99 prior to any TCs, by specifying C99 TC2 I was referring to C99 after TC2, and by specifying C11 TC1 I was referring to C11 after TC1. If you have a suggestion as to how to reflect these in a less confusing way, do let me know. As of now I'm adding parenthesesMile
Q1 is a non-question. If you write a bug in your program, FAMs or otherwise, it may have undefined behaviour. Can you explain what you are asking in Q1 (and how it differs from your Q2)?Incensory
@Incensory The first question is specifically about UB upon using structs with FAMs (are we at any risk of UB upon using it?), while the second question is about whether it's even possible for a program to use FAMs and still be a strictly conforming program (are we always failing strict conformance?). My answers below address both of the concerns related to UB and strict conformance. I suppose that it would be helpful to explain what UB and what strict conformance mean and how the criteria differs, but it seems that the more verbose the text, the less patience the readers have for reading it.Mile
I don't really follow, e.g. you could write out of bounds of allocated space with a FAM and that would be UB, so there is risk of UB thereIncensory
@Incensory I think that my code sample is straight-forward and my answers are sufficiently verbose. I don't think that by making the questions overly explicit it would change things for the better. But if you have a suggestion for how to improve the questions, then by all means- do let me know.Mile
M
26

The Short Answer

  1. Yes. Common conventions of using FAMs expose our programs to the possibility of undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.

  2. Possible, but unlikely. Even if we don't actually reach undefined behavior, we are still likely to fail strict conformance.

  3. No. The offset of the FAM is not required to be at the end of the struct, it may overlay any trailing padding bytes.

The answers apply to both C99 (TC3) and C11 (TC1).


The Long Answer

FAMs were first introduced in C99 (TC0) (Dec 1999), and their original specification required the offset of the FAM to be at the end of the struct. The original specification was well-defined and as such couldn't lead to undefined behavior or be an issue with regards to strict conformance.

C99 (TC0) §6.7.2.1 p16 (Dec 1999)

[This document is the official standard, it is copyrighted and not freely available]

The problem was that common C99 implementations, such as GCC, didn't follow the requirement of the standard, and allowed the FAM to overlay any trailing padding bytes. Their approach was considered to be more efficient, and since for them to follow the requirement of the standard- would result with breaking backwards compatibility, the committee chose to change the specification, and as of C99 TC2 (Nov 2004) the standard no longer required the offset of the FAM to be at the end of the struct.

C99 (TC2) §6.7.2.1 p16 (Nov 2004)

[...] the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

The new specification removed the statement that required the offset of the FAM to be at the end of the struct, and it introduced a very unfortunate consequence, because the standard gives the implementation the liberty not to keep the values of any padding bytes within structures or unions in a consistent state. More specifically:

C99 (TC3) §6.2.6.1 p6

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

This means that if any of our FAM elements correspond to (or overlay) any trailing padding bytes, upon storing to a member of the struct- they (may) take unspecified values. We don't even need to ponder whether this applies to a value stored to the FAM itself, even the strict interpretation that this only applies to members other than the FAM, is damaging enough.

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

int main(void) {
    struct s {
        size_t len;
        char pad;
        int array[];
    };

    struct s *s = malloc(sizeof *s + sizeof *s->array);

    if (sizeof *s > offsetof(struct s, array)) {
        s->array[0] = 123;
        s->len = 1; /* any padding bytes take unspecified values */

        printf("%d\n", s->array[0]); /* indeterminate value */
    }

    free(s);
    return 0;
}

Once we store to a member of the struct, the padding bytes take unspecified bytes, and therefore any assumption made about the values of the FAM elements that correspond to any trailing padding bytes, is now false. Which means that any assumption leads to us failing strict conformance.

Undefined behavior

Although the values of the padding bytes are "unspecified values", the same can't be said about the type being affected by them, because an object representation which is based on unspecified values can generate a trap representation. So the only standard term which describes these two possibilities would be "indeterminate value". If the type of the FAM happens to have trap representations, then accessing it is not just a concern of an unspecified value, but undefined behavior.

But wait, there's more. If we agree that the only standard term to describe such value is as being an "indeterminate value", then even if the type of the FAM happens not to have trap representations, we've reached undefined behavior, since the official interpretation of the C standards committee is that passing indeterminate values to standard library functions is undefined behavior.

Mile answered 25/6, 2017 at 11:1 Comment(16)
It doesn't look like the passage you are quoting has anything to do with the issue you expand on later. C99 TC2 §6.7.2.1 p16 implies that given struct A {char x; int y[];}; , it is permitted for sizeof(struct A) to be e.g. 4 whereas omission of y would make sizeof(struct A) equal 1. y would not overlap with the added trailing padding. OTOH given struct B {int x; short y; char z[];}; the member z may overlap with the trailing padding struct B {int x; short y;} would have anyway, with or without C99 TC2 §6.7.2.1 p16. Anyway C99 is superseded by C11 and is no longer a standard.Boar
The conclusion that at the bytes at the offset of the FAM would still be considered padding needs a reference IMHO. My understanding is that these bytes are padding in a struct that omits the FAM, but not in the struct with the FAM.Secondrate
@FelixPalmen I don't see any normative language but the example in p25 (as of n1570) suggests so: The assignment: *s1 = *s2; only copies the member n; if any of the array elements are within the first sizeof (struct s) bytes of the structure, they might be copied or simply overwritten with indeterminate values.Boar
@n.m The official C99 standard (Dec 1999) explicitly required it, I didn't quote it because it's not freely available. C99 TC2 removed the requirement and added the statement I quoted. The line following the quote: "The new specification removed the statement that required the offset of the FAM to be at the end of the struct".Mile
@FelixPalmen I'm not sure if I understand your understanding. Are you trying to say that your understanding is that a struct with a FAM is required not to have trailing padding bytes? If that's the case, the standard has to explicitly require it. If there's no such an explicit requirement then it's not required and it becomes an implementation-detail.Mile
What exactly did it require? Technical corrigenda are freely available (3 PDFs) and I can see nothing in them that pertains to flexible members. Can you point out the exact sections amended by TC1..TC3 that introduce changes in the flexible members feature?Boar
@n.m The original specification explicitly stated that the offset of the FAM will be equal to the sizeof of the struct. TCs are freely available and that's why I quoted them. My answer contains an exact quote, I'm not sure why you couldn't find it in C99 TC2 ... paragraph 20. You're welcome to come to the chat of the C roomMile
I have found it now, sorry about the confusion.Boar
@DrorK. no, but "trailing" would mean after the FAM. Maybe there's just a relevant citation missing, but right now I don't see why the bytes "occupied" by a first FAM element should be considered padding -- they're of course padding in the same struct omitting the FAM, yes ... n.m.'s citation only tells me that assignment always works as if there was no FAM. I don't think that's the same.Secondrate
As things stand, I'm unconvinced by the argument presented. It's going to take me time to work up a rebuttal; I started, but I've not finished.Ferromagnetism
@FelixPalmen I'm afraid that it seems to me that you're under the assumption that the standard has a requirement that either doesn't exist, or I simply fail to see it. You are more than welcome to discuss it in the C chat room!Mile
@JonathanLeffler My answer touches many aspects of the C standard, but you don't have to tackle all of it at once, you're welcome to tackle the core of it which stands on pretty much 2 paragraphs- and you too are welcome to discuss it in the C chat room.Mile
(And for the lazy, here is the link to the C chat room)Waxler
To All: I'm considering marking my own answer as the accepted answer, since it seems that my defect report will be submitted soon. If you expect to post your answer anytime soon, I can wait. Otherwise I think it would make more sense not to keep the question open, and wait for the official decision of the committee as to my suggested TC. Thanks a lot for your help and feedback, very much appreciated!Mile
@DrorK. Is there a link to the final TC regarding your issue? I don't see any real conclusion to this either here or in the linked chat, now three years ago.Freaky
So to be clear, the padding bytes that encompass the first few bytes into the flexible array member could cause undefined behavior? This seems like a fault in the standard.Caisson
F
22

This is a long answer dealing extensively with a tricky topic.

TL;DR

I disagree with the analysis by Dror K.

The key problem is a misunderstanding of what the §6.2.1 ¶6 in the C99 and C11 standards means, and inappropriately applying it to a simple integer assignment such as:

fam_ptr->nonfam_member = 23;

This assignment is not allowed to change any padding bytes in the structure pointed at by fam_ptr. Consequently, the analysis based on the presumption that this can change padding bytes in the structure is erroneous.

Background

In principle, I'm not dreadfully concerned about the C99 standard and its corrigenda; they're not the current standard. However, the evolution of the specification of flexible array members is informative.

The C99 standard — ISO/IEC 9899:1999 — had 3 technical corrigenda:

  • TC1 released on 2001-09-01 (7 pages),
  • TC2 released on 2004-11-15 (15 pages),
  • TC3 released on 2007-11-15 (10 pages).

It was TC3, for example, that stated that gets() was obsolescent and deprecated, leading to it being removed from the C11 standard.

The C11 standard — ISO/IEC 9899:2011 — has one technical corrigendum, but that simply sets the value of two macros accidentally left in the form 201ymmL — the values required for __STDC_VERSION__ and __STDC_LIB_EXT1__ were corrected to the value 201112L. (You can see the TC1 — formally "ISO/IEC 9899:2011/Cor.1:2012(en) Information technology — Programming languages — C TECHNICAL CORRIGENDUM 1" — at https://www.iso.org/obp/ui/#iso:std:iso-iec:9899:ed-3:v1:cor:1:v1:en. I've not worked out how you get a download of it, but it is so simple that it really doesn't matter very much.

C99 standard on flexible array members

ISO/IEC 9899:1999 (before TC2) §6.7.2.1 ¶16:

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. With two exceptions, the flexible array member is ignored. First, the size of the structure shall be equal to the offset of the last element of an otherwise identical structure that replaces the flexible array member with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

126) The length is unspecified to allow for the fact that implementations may give array members different alignments according to their lengths.

(This footnote is removed in the rewrite.) The original C99 standard included one example:

¶17 EXAMPLE Assuming that all array members are aligned the same, after the declarations:

struct s { int n; double d[]; };
struct ss { int n; double d[1]; };

the three expressions:

sizeof (struct s)
offsetof(struct s, d)
offsetof(struct ss, d)

have the same value. The structure struct s has a flexible array member d.

¶18 If sizeof (double) is 8, then after the following code is executed:

struct s *s1;
struct s *s2;
s1 = malloc(sizeof (struct s) + 64);
s2 = malloc(sizeof (struct s) + 46);

and assuming that the calls to malloc succeed, the objects pointed to by s1 and s2 behave as if the identifiers had been declared as:

struct { int n; double d[8]; } *s1;
struct { int n; double d[5]; } *s2;

¶19 Following the further successful assignments:

s1 = malloc(sizeof (struct s) + 10);
s2 = malloc(sizeof (struct s) + 6);

they then behave as if the declarations were:

struct { int n; double d[1]; } *s1, *s2;

and:

double *dp;
dp = &(s1->d[0]); // valid
*dp = 42;         // valid
dp = &(s2->d[0]); // valid
*dp = 42;         // undefined behavior

¶20 The assignment:

*s1 = *s2;

only copies the member n and not any of the array elements. Similarly:

struct s t1 = { 0 };          // valid
struct s t2 = { 2 };          // valid
struct ss tt = { 1, { 4.2 }}; // valid
struct s t3 = { 1, { 4.2 }};  // invalid: there is nothing for the 4.2 to initialize
t1.n = 4;                     // valid
t1.d[0] = 4.2;                // undefined behavior

Some of this example material was removed in C11. The change was not noted (and did not need to be noted) in TC2 because the examples are not normative. But the rewritten material in C11 is informative when studied.

N983 paper identifying a problem with flexible array members

N983 from the WG14 Pre-Santa Cruz-2002 mailing is, I believe, the initial statement of a defect report. It states that some C compilers (citing three) manage to put a FAM before the padding at the end of a structure. The final defect report was DR 282.

As I understand it, this report led to the change in TC2, though I have not traced all the steps in the process. It appears that the DR is no longer available separately.

TC2 used the wording found in the C11 standard in the normative material.

C11 standard on flexible array members

So, what does the C11 standard have to say about flexible array members?

§6.7.2.1 Structure and union specifiers

¶3 A structure or union shall not contain a member with incomplete or function type (hence, a structure shall not contain an instance of itself, but may contain a pointer to an instance of itself), except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.

This firmly positions the FAM at the end of the structure — 'the last member' is by definition at the end of the structure, and this is confirmed by:

¶15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.

¶17 There may be unnamed padding at the end of a structure or union.

¶18 As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

This paragraph contains the change in ¶20 of ISO/IEC 9899:1999/Cor.2:2004(E) — the TC2 for C99;

The data at the end of the main part of a structure containing a flexible array member is regular trailing padding that can occur with any structure type. Such padding can't be accessed legitimately, but can be passed to library functions etc via pointers to the structure without incurring undefined behaviour.

The C11 standard contains three examples, but the first and third are related to anonymous structures and unions rather than the mechanics of flexible array members. Remember, examples are not 'normative', but they are illustrative.

¶20 EXAMPLE 2 After the declaration:

struct s { int n; double d[]; };

the structure struct s has a flexible array member d. A typical way to use this is:

int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:

struct { int n; double d[m]; } *p;

(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

¶21 Following the above declaration:

struct s t1 = { 0 };          // valid
struct s t2 = { 1, { 4.2 }};  // invalid
t1.n = 4;                     // valid
t1.d[0] = 4.2;                // might be undefined behavior

The initialization of t2 is invalid (and violates a constraint) because struct s is treated as if it did not contain member d. The assignment to t1.d[0] is probably undefined behavior, but it is possible that

sizeof (struct s) >= offsetof(struct s, d) + sizeof (double)

in which case the assignment would be legitimate. Nevertheless, it cannot appear in strictly conforming code.

¶22 After the further declaration:

struct ss { int n; };

the expressions:

sizeof (struct s) >= sizeof (struct ss)
sizeof (struct s) >= offsetof(struct s, d)

are always equal to 1.

¶23 If sizeof (double) is 8, then after the following code is executed:

struct s *s1;
struct s *s2;
s1 = malloc(sizeof (struct s) + 64);
s2 = malloc(sizeof (struct s) + 46);

and assuming that the calls to malloc succeed, the objects pointed to by s1 and s2 behave, for most purposes, as if the identifiers had been declared as:

struct { int n; double d[8]; } *s1;
struct { int n; double d[5]; } *s2;

¶24 Following the further successful assignments:

s1 = malloc(sizeof (struct s) + 10);
s2 = malloc(sizeof (struct s) + 6);

they then behave as if the declarations were:

struct { int n; double d[1]; } *s1, *s2;

and:

double *dp;
dp = &(s1->d[0]); // valid
*dp = 42;         // valid
dp = &(s2->d[0]); // valid
*dp = 42;         // undefined behavior

¶25 The assignment:

*s1 = *s2;

only copies the member n; if any of the array elements are within the first sizeof (struct s) bytes of the structure, they might be copied or simply overwritten with indeterminate values.

Note that this changed between C99 and C11.

Another part of the standard describes this copying behaviour:

§6.2.6 Representation of types §6.2.6.1 General

¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

51) Thus, for example, structure assignment need not copy any padding bits.

Illustrating the problematic FAM structure

In the C chat room, I wrote some information of which this is a paraphrase:

Consider:

struct fam1 { double d; char c; char fam[]; };

Assuming double requires 8-byte alignment (or 4-byte; it doesn't matter too much, but I'll stick with 8), then struct non_fam1a { double d; char c; }; would have 7 padding bytes after c and a size of 16. Further, struct non_fam1b { double d; char c; char nonfam[4]; }; would have 3 bytes padding after the nonfam array, and a size of 16.

The suggestion is that the start of fam in struct fam1 can be at offset 9, even though sizeof(struct fam1) is 16. So that the bytes after c are not padding (necessarily).

So, for a small enough FAM, the size of the struct plus FAM might still be less than size of struct fam.

The prototypical allocation is:

struct fam1 *fam = malloc(sizeof(struct fam1) + array_size * sizeof(char));

when the FAM is of type char (as in struct fam1). That's a (gross) over-estimate when the offset of fam is less than sizeof(struct fam1).

Dror K. pointed out:

There are macros out there for calculating the 'precise' required storage based on FAM offsets that are less than the size of the structure. Such as this one: https://gustedt.wordpress.com/2011/03/14/flexible-array-member/

Addressing the question

The question asks:

  1. By using flexible array members (FAMs) within structure types, are we exposing our programs to the possibility of undefined behavior?
  2. Is it possible for a program to use FAMs and still be a strictly conforming program?
  3. Is the offset of the flexible array member required to be at the end of the struct?

The questions apply to both C99 (TC3) and C11 (TC1).

I believe that if you code correctly, the answers are "No", "Yes", "No and Yes, depending …".

Question 1

I am assuming that the intent of question 1 is "must your program inevitably be exposed to undefined behaviour if you use any FAM anywhere?" To state what I think is obvious: there are lots of ways of exposing a program to undefined behaviour (and some of those ways involve structures with flexible array members).

I do not think that simply using a FAM means that the program automatically has (invokes, is exposed to) undefined behaviour.

Question 2

Section §4 Conformance defines:

¶5 A strictly conforming program shall use only those features of the language and library specified in this International Standard.3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.

3) A strictly conforming program can use conditional features (see 6.10.8.3) provided the use is guarded by an appropriate conditional inclusion preprocessing directive using the related macro. …

¶7 A conforming program is one that is acceptable to a conforming implementation.5).

5) Strictly conforming programs are intended to be maximally portable among conforming implementations. Conforming programs may depend upon nonportable features of a conforming implementation.

I don't think there are any features of standard C which, if used in the way that the standard intends, makes the program not strictly conforming. If there are any such, they are related to locale-dependent behaviour. The behaviour of FAM code is not inherently locale-dependent.

I do not think that the use of a FAM inherently means that the program is not strictly conforming.

Question 3

I think question 3 is ambiguous between:

  • 3A: Is the offset of the flexible array member required to be equal to the size of the structure containing the flexible array member?
  • 3B: Is the offset of the flexible array member required to be greater than the offset of any prior member of the structure?

The answer to 3A is "No" (witness the C11 example at ¶25, quoted above).

The answer to 3B is "Yes" (witness §6.7.2.1 ¶15, quoted above).

Dissenting from Dror's answer

I need to quote the C standard and Dror's answer. I'll use [DK] to indicate the start of a quote from Dror's answer, and unmarked quotations are from the C standard.

As of 2017-07-01 18:00 -08:00, the short answer by Dror K said:

[DK]

  1. Yes. Common conventions of using FAMs expose our programs to the possibility of undefined behavior. Having said that, I'm unaware of any existing conforming implementation that would misbehave.

I'm not convinced that simply using a FAM means that the program automatically has undefined behaviour.

[DK]

  1. Possible, but unlikely. Even if we don't actually reach undefined behavior, we are still likely to fail strict conformance.

I'm not convinced that the use of a FAM automatically renders a program not strictly conforming.

[DK]

  1. No. The offset of the FAM is not required to be at the end of the struct, it may overlay any trailing padding bytes.

This is the answer to my interpretation 3A, and I agree with this.

The long answer contains interpretation of the short answers above.

[DK]

The problem was that common C99 implementations, such as GCC, didn't follow the requirement of the standard, and allowed the FAM to overlay any trailing padding bytes. Their approach was considered to be more efficient, and since for them to follow the requirement of the standard- would result with breaking backwards compatibility, the committee chose to change the specification, and as of C99 TC2 (Nov 2004) the standard no longer required the offset of the FAM to be at the end of the struct.

I agree with this analysis.

[DK]

The new specification removed the statement that required the offset of the FAM to be at the end of the struct, and it introduced a very unfortunate consequence, because the standard gives the implementation the liberty not to keep the values of any padding bytes within structures or unions in a consistent state.

I agree that the new specification removed the requirement that the FAM be stored at an offset greater than or equal to the size of the structure.

I don't agree that there is a problem with the padding bytes.

The standard explicitly says that structure assignment for a structure containing a FAM effectively ignores the FAM (§6.7.2.1 ¶18). It must copy the non-FAM members. It is explicitly stated that padding bytes need not be copied at all (§6.2.6.1 ¶6 and footnote 51). And the Example 2 explicitly states (non-normatively §6.7.2.1 ¶25) that if the FAM overlaps the space defined by the structure, the data from the part of the FAM that overlaps with the end of the structure might or might not be copied.

[DK]

This means that if any of our FAM elements correspond to (or overlay) any trailing padding bytes, upon storing to a member of the struct- they (may) take unspecified values. We don't even need to ponder whether this applies to a value stored to the FAM itself, even the strict interpretation that this only applies to members other than the FAM, is damaging enough.

I don't see this as a problem. Any expectation that you can copy a structure containing a FAM using structure assignment and have the FAM array copied is is inherently flawed — the copy leaves the FAM data logically uncopied. Any program that depends on the FAM data within the scope of the structure is broken; that is a property of the (flawed) program, not the standard.

[DK]

#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

int main(void) {
    struct s {
        size_t len;
        char pad;
        int array[];
    };

    struct s *s = malloc(sizeof *s + sizeof *s->array);

    if (sizeof *s > offsetof(struct s, array)) {
        s->array[0] = 123;
        s->len = 1; /* any padding bytes take unspecified values */

        printf("%d\n", s->array[0]); /* indeterminate value */
    }

    free(s);
    return 0;
}

Ideally, of course, the code would set the named member pad to a determinate value, but that doesn't cause actually cause a problem since it is never accessed.

I emphatically disagree that the value of s->array[0] in the printf() is indeterminate; its value is 123.

The prior standard quote is (it is the same §6.2.6.1 ¶6 in both C99 and C11, though the footnote number is 42 in C99 and 51 in C11):

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

Note that s->len is not an assignment to an object of structure or union type; it is an assignment to an object of type size_t. I think this may be the main source of confusion here.

If the code included:

struct s *t = malloc(sizeof(*t) + sizeof(t->array[0]));
*t = *s;
printf("t->array[0] = %d\n", t->array[0]);

then the value printed is indeed indeterminate. However, that's because copying a structure with a FAM is not guaranteed to copy the FAM. More nearly correct code would be (assuming you add #include <string.h>, of course):

struct s *t = malloc(sizeof(*t) + sizeof(t->array[0]));
*t = *s;
memmmove(t->array, s->array, sizeof(t->array[0]));
printf("t->array[0] = %d\n", t->array[0]);

Now the value printed is determinate (it is 123). Note that the condition on if (sizeof *s > offsetof(struct s, array)) is immaterial to my analysis.

Since the rest of the long answer (mainly the section identified by the heading 'undefined behavior') is based on a false inference about the possibility of the padding bytes of a structure changing when assigning to an integer member of a structure, the rest of the discussion does not need further analysis.

[DK]

Once we store to a member of the struct, the padding bytes take unspecified bytes, and therefore any assumption made about the values of the FAM elements that correspond to any trailing padding bytes, is now false. Which means that any assumption leads to us failing strict conformance.

This is based on a false premise; the conclusion is false.

Ferromagnetism answered 2/7, 2017 at 12:11 Comment(15)
6.2.6.1/6, which you quoted, plainly states; "bytes of the object representation that correspond to any padding bytes take unspecified value" . But your "tl;dr" contradicts this (with no justification that I can see) by claiming that the padding bytes can't change in this situation.Incensory
@M.M: The context of §6.2.6.1 ¶6 is a structure assignment or union assignment; not an assignment to a structure member or union member. There's a difference between copying the whole structure or union and setting a single member of the structure or union. That's why I think there's a logic flaw in Dror's argument.Ferromagnetism
@JonathanLeffler I'm not sure how you can interpret it differently: "When a value is stored in an object of structure or union type, including in a member object [...]"Mile
a->b = c; is storing a value in a member object of aIncensory
@JonathanLeffler You quoted me but the words you used to describe what I said- are not the same: "I agree that the new specification removed the requirement that the FAM be stored at an offset greater than or equal to the size of the structure." ... I said: "The new specification removed the statement that required the offset of the FAM to be at the end of the struct"Mile
OK; then we're going to have to find a language lawyer. The statement starts "when a value is stored in an object of structure or union type". An integer is not an object of structure or union type? Do we agree on that? And do we agree that a->b = c; in this context is an integer assignment?Ferromagnetism
BTW the rationale for this rule is so that if you have, e.g. struct { char ch; int x; } a;, then ++(a->ch) can be implemented by a word-size increment instructionIncensory
@M.M: which rationale? Where?Ferromagnetism
@JonathanLeffler you still seem to be overlooking including in a member object. I don't know how you can say that an assignment statement that assigns to a member object does not store a value in that member object...Incensory
I'm waiting for agreement on the first part, before I dissect the continuation.Ferromagnetism
The continuation is "including in a member object" means "a structure or union member that is inside another structure or union". The first part dominates — this is about structure assigment, not about integer assignment (non-structure, non-union assignment).Ferromagnetism
It doesn't mean that. It says "a member object". An int is an object. It does not say "a member object that is of structure or union type".Incensory
Let us continue this discussion in chat.Ferromagnetism
@JonathanLeffler, I agree with Dror's interpretation, for the simple reason that the paragraph in itself makes not much sense if you restrict it to members that are themself structures or unions. The intent of that paragraph is to allow leeway for compilers to handle padding bytes as they want when you change an element of a structure.Barbershop
@JensGustedt: Given struct foo {int x; short y;};, there are some platforms where using an int-sized store to update y might be faster than updating y without disturbing what's beyond. The Standard doesn't forbid implementations from behaving in such fashion, but that doesn't imply that non-defective code must be prepared for that possibility even when it's only intended to run on platforms which can update short values as efficiently as int.Advancement
A
7

If one allows for a strictly conforming program to make use of Implementation-Defined behavior in cases where it would "work" with all legitimate behaviors (notwithstanding that almost any kind of useful output would depend upon Implementation-Defined details such as the execution character set), use of flexible array members within a strictly-conforming program should be possible provided that the program does not care whether the offset of the flexible array member coincides with the length of the structure.

Arrays are not regarded as having any padding internally, so any padding that gets added because of the FAM would precede it. If there is enough space either within or beyond a structure to accommodate members in an FAM, those members are part of the FAM. For example, given:

struct { long long x; char y; short z[]; } foo;

the size of "foo" may be padded out beyond the start of z because of alignment, but any such padding will be usable as part of z. Writing y might disturb padding that precedes z, but should not disturb any part of z itself.

Advancement answered 26/6, 2017 at 19:30 Comment(3)
I'm sorry but your understanding is incorrect. The standard explicitly states "shall not produce output dependent", that doesn't mean that it's not allowed "to use". Simply using 'sizeof' and/or 'offsetof' doesn't break strict conformance. I'm afraid in the current state your answer should be down-voted.Mile
@DrorK.: Better?Advancement
You're more than welcome to discuss this matter in the C chat roomMile

© 2022 - 2024 — McMap. All rights reserved.