Is using flexible array members in C bad practice?
Asked Answered
C

7

88

I recently read that using flexible array members in C was poor software engineering practice. However, that statement was not backed by any argument. Is this an accepted fact?

(Flexible array members are a C feature introduced in C99 whereby one can declare the last element to be an array of unspecified size. For example: )

struct header {
    size_t len;
    unsigned char data[];
};
Col answered 29/10, 2008 at 14:21 Comment(0)
L
35

It is an accepted "fact" that using goto is poor software engineering practice. That doesn't make it true. There are times when goto is useful, particularly when handling cleanup and when porting from assembler.

Flexible array members strike me as having one main use, off the top of my head, which is mapping legacy data formats like window template formats on RiscOS. They would have been supremely useful for this about 15 years ago, and I'm sure there are still people out there dealing with such things who would find them useful.

If using flexible array members is bad practice, then I suggest that we all go tell the authors of the C99 spec this. I suspect they might have a different answer.

Lavallee answered 29/10, 2008 at 14:36 Comment(10)
goto is also useful when we want to implement a recursive implementation of an algorithm using a non recursive implementation in those cases where recursion could raise an additional overhead on the compiler.Uitlander
@Uitlander You should probably be using while, then.Elysha
Network programming is another, you have the header as a struct, and the packet(or what it is called in the layer you in..) as the flexible array. Calling the next layer, you strip of the header, and pass the packet. Do this for each layer in the network stack. (You case the data from lower revived from lower layer to struct for layer you are inn)Supraliminal
@Uitlander goto is not for loops.Shooin
"There are times when goto is useful" See, this is why I sometimes shudder while thinking some kid who's just learning to program will resort to StackOverflow for learning best practices.Weiler
@Weiler I'm not sure I see your point. Are you disagreeing that it's useful? No one is suggesting here that this is best practice. Porting legacy code and best practice rarely go hand-in-hand. Many of the solutions I find on SO to, say, iOS problems, are hacks and certainly not best practice - but often they are the only solution to the problem.Lavallee
goto is never useful. Sprinkling additional gotos in legacy code is especially bad.Weiler
Flexible length array members are used for variable length arrays such that sizeof(struct header) is added to sizeof(unsigned char) * n before being malloc()ed, where n is the desired length of data`. Exercise caution, however; API functions have no idea how much memory you've allocated, and will readily segfault if you tell them your array is bigger than it actually isPuttier
If you are writing by-hand serialization code, flexible array members come in handy.Moray
while and for had a lot of semantics under those very terse statements. I would like to see a single yet simple loop statement that works with goto (in place of "break" and "continue") as a replacement for while and for. Complex loops will be documented with well-named labels inside a loop block; for everything simpler, you just loop if (predicate_expression) block, loop quantifier block, or loop block. And let's not forget the obvious: goto jump tables (aka switches with user-defined semantics)! Very useful for directed graph logic programming!Polluted
V
29

No, using flexible array members in C is not bad practice.

This language feature was first standardized in ISO C99, 6.7.2.1 (16). In the following revision, ISO C11, it is specified in Section 6.7.2.1 (18).

You can use them like this:

struct Header {
    size_t d;
    long v[];
};
typedef struct Header Header;
size_t n = 123; // can dynamically change during program execution
// ...
Header *h = malloc(sizeof(Header) + sizeof(long[n]));
h->n = n;

Alternatively, you can allocate like this:

Header *h = malloc(sizeof *h + n * sizeof h->v[0]);

Note that sizeof(Header) includes eventual padding bytes, thus, the following allocation is incorrect and may yield a buffer overflow:

Header *h = malloc(sizeof(size_t) + sizeof(long[n])); // invalid!

A struct with a flexible array members reduces the number of allocations for it by 1/2, i.e. instead of 2 allocations for one struct object you need just 1. Meaning less effort and less memory occupied by memory allocator bookkeeping overhead. Furthermore, you save the storage for one additional pointer. Thus, if you have to allocate a large number of such struct instances you measurably improve the runtime and memory usage of your program (by a constant factor).

In contrast to that, using non-standardized constructs for flexible array members that yield undefined behavior (e.g. as in long v[0]; or long v[1];) obviously is bad practice. Thus, as any undefined-behaviour this should be avoided.

Since ISO C99 was released in 1999, more than 20 years ago, striving for ISO C89 compatibility is a weak argument.

Voluptuary answered 16/9, 2017 at 8:37 Comment(0)
C
17

PLEASE READ CAREFULLY THE COMMENTS BELOW THIS ANSWER

As C Standardization move forward there is no reason to use [1] anymore.

The reason I would give for not doing it is that it's not worth it to tie your code to C99 just to use this feature.

The point is that you can always use the following idiom:

struct header {
  size_t len;
  unsigned char data[1];
};

That is fully portable. Then you can take the 1 into account when allocating the memory for n elements in the array data :

ptr = malloc(sizeof(struct header) + (n-1));

If you already have C99 as requirement to build your code for any other reason or you are target a specific compiler, I see no harm.

Candiecandied answered 29/10, 2008 at 14:36 Comment(37)
The last line should be ptr = malloc(sizeof(header) + n); where n is the length of the string and you use the 1 as terminating \0.Hireling
Thanks. I left the n-1 since it might not be used as a string.Candiecandied
use wouldn't care about the sign if it was for sure a string. Regarding this, n-1 is correct.Chiropodist
The 'following idiom' is not fully portable, which is why flexible array members were added to the C99 standard.Setaceous
I can say that using this approach does generates big problems. For example if you are using Secure CRT functions and you try to do a strcpy(data, sometext) you will get buffer underrun errors at runtime.Poddy
@Poddy not sure what you mean. The problem you're talking about is related to using strcpy() instead of strncpy() the question is about creating arrays that can grow.Candiecandied
@Jonathan. Sorry I don't get why this is not portable, could you clarify better?Candiecandied
Jonathan's point is echoed by the committee (or at least some members), but it contradicts the facts. When considered together, several other parts of the standard require the old [1] trick to work just as well as [] (aside from possibly wasting a few extra bytes of storage).Diphtheria
@Remo.D: minor point: the n-1 does not accurately accounts for the extra allocation, because of alignment: on most 32-bit machines, sizeof(struct header) will be 8 (to remain multiple of 4, since it has a 32-bit field which prefers/requires such alignment). The "better" version is: malloc(offsetof(struct header, data) + n)Thresher
@Tom: Th1 minus one is there to account the 1 already present in the declaration data[1]. Of course malloc() can (and usually will) allocate more memory to comply with the alignment rules or do whatever it needs to do to manage the memory block.Candiecandied
In C99 using unsigned char data[1] isn't portable because ((header*)ptr)->data + 2 -- even if enough space was allocated -- creates a pointer that points outside the length-1 array object (and not the sentinel one past the end). But per C99 6.5.6p8, "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined" (emphasis added). Flexible arrays (6.7.2.2p16) act like an array filling the allocated space to not hit UB here.Daubery
The situation appears to be similar in the C89 draft here, too: port70.net/~nsz/c/c89/c89-draft.html#3.3.6 (Daubery
@R.. I do not have an actual counter-example to the claim that [1] must work as well as [] but the first example in this post comes close: GCC (and not even particularly recent versions of it) assume that accesses to an array member remain within the array. It is by courtesy that GCC doesn't assume q->tab[2] is an unreachable expression in the second example. blog.frama-c.com/index.php?post/2013/07/31/…Descend
@PascalCuoq: In the case of char types, the pointer q->tab decays to is also a pointer to a part of the representation of the entire object.Diphtheria
This idiom is not recommended by CERT Secure Coding Standards: "MEMxx-C. Understand how flexible array members are to be used" - "The problem with using this approach is that the behavior is undefined when accessing other than the first element of data"Megaspore
*WARNING: Using [1] has been shown to cause GCC to generate incorrect code: lkml.org/lkml/2015/2/18/407Amazing
@PascalCuoq: I find it interesting how many needless problems could have been avoided simply by having compilers regard a zero-sized array declaration within a struct as forcing an alignment to the array type and a zero-byte allocation. While I despise the way compilers aggressively using UB for assertions, I would think a compiler should be entitled to say that if struct S contains int foo[1], a compiler should be allowed to replace foo[i] with foo[0]; if zero-element arrays were allowed, accessing array elements beyond [size-1u] was UB, such substitution would be legal but...Sohn
...code which needed arrays to end with a variable amount of data could still do so. Incidentally, I also think there should have been (still should be) a syntax to declare a variable of a type that ends in a size-zero array and specify an amount of extra space for that array [proposed syntax, if S has foo[0] as its last element: struct S foo[3]+[5]; would declare an array of three structures, each of which had space for five elements in its foo array.] IMHO, that would be much nicer than anything presently allowed with flexible array members.Sohn
@Sohn that's the start of a very slippery slopeKowalczyk
@M.M: What's the start of a slippery slope--the replacement of foo[i] with foo[0], or the idea that zero-sized objects should be allowed provided programs don't perform arithmetic on pointers to such objects [just as they're forbidden from performing arithmetic on void*]? Having a compiler replace foo[i] with foo[0] would be downright benign compared with hyper-modern optimizations.Sohn
@Sohn Allowing the code to declare an array of a certain size, and then access out of bounds of the declared size. FAM doesn't count as starting the slope IMO, since the lack of a declared size at all marks it as "special".Kowalczyk
@M.M: Are you disapproving of the practice of having programmers declare a structure as ending with dat[1] but then dereferencing data beyond that, disapproving of allowing such accesses with a zero-element array, or both? I don't like the former, but historically it was rendered necessary by the prohibition against zero-element arrays; I'd say that having compilers not bother yielding an error for zero-element arrays but requiring that arrays be accessed in the range 0 to size-(size_t)1 would have been simpler and cleaner from both a coding and compiler perspective.Sohn
@M.M: Actually, one thing I've long wished for in C would be a syntax for indicating that a structure member should not allocate space, but be forced to a certain offset relative to another structure member or the end of the structure. If such a thing were supported for bitfields, it could make them portable (e.g. uint32_t first_two_fields; int field1 = first_two_fields.0:23; int field2 = first_two_fields.23:9; would mean that field1 would occupy the lower 23 bits of first_two_fields and field2 would occupy the upper 9 bits). That would have allowed for useful optimizations...Sohn
...in cases where a variable-sized portion of the array will be small enough to allow a simpler addressing mode than would be necessary if array subscripts could be of any size.Sohn
@Sohn disapproving of both. In C89 they were hacks that were justifiable in some cases. In C99 there is Flexible Array Member which was introduced precisely to give a well-defined tool that renders all of those hacks unnecessary. Old code should be migrated. The F.A.M. has the advantage of being an incomplete type, so it is impossible to accidentally apply sizeof to itKowalczyk
@M.M: Under C99 or C11, as far as I can tell, it's impossible to declare an object of static or automatic duration which is type-compatible with a structure containing a Flexible Array Member. I agree the FAM is better than the size-1 and size-0 struct hacks, but wish it had been specified better. Among other things, given struct x {uint32_t x; uint8_t y; uint16_t z[];} I would have specified that the offset of z should be the same as for any fixed size (and all fixed sizes should imply the same offset), and sizeof (struct x) should yield the offset of z.Sohn
@Sohn I think sizeof(struct x) does yield the offset. There may be padding before z according to the standard, but IMO that is not an issue, as (a) there may be padding almost anywhere and in practice compilers only use padding where required for alignment, and (b) it's easy enough to write code that checks for padding, and/or does not rely on padding's presenceKowalczyk
@M.M: In cases where the structure would have an alignment of 4, and the "natural" offset for x would be 6 (which is not a multiple of 4), does the Standard indicate that neither the size nor the offset should be rounded up to 8?Sohn
@Sohn AFAIK there is no such requirement; there cannot be an array of structs with flexible array member. gcc does actually give a larger result for sizeof than the offsetof in a case I tried... I agree that this is lame , however maybe the gcc developers had backwards compaitibility of some sort in mind. gcc does seem to implement array of f.a.m. structs as an extensionKowalczyk
@M.M: I'm not positive, but I believe the Standard (stupidly IMHO) requires that the length of the incomplete structure be rounded up to its alignment, even though that messes up efforts to compute the size of a structure with some number of elements in the flexible array. I suspect the Standard would allow an implementation to add padding before the FAM to make its offset match the struct length even when a normal array's offset would not have been influenced in such fashion, but that would interfere with what would otherwise be the most sensible way of creating...Sohn
@Sohn I don't see any such requirement; it just says that there might be trailing padding (not that there must be)Kowalczyk
...a static or automatic item containing an FAM when using dialects of C which use a relaxed version of C99's type rules [i.e. declare a structure which is identical except that it has a fixed-sized array, and then alias the pointer]. Actually, what might have been best would have been to define a syntax for "struct foo(x) {int blah; char y; short dat[x];}" and then say that a "struct foo(3)" will be a "struct foo" where the final array has a size of 3. A platform could then say that all "struct foo(N)" will have the same offset for "dat" and will be alias-compatible, even if...Sohn
...neither assumption would hold for independent structures with different fixed array sizes. I have no problem with the idea that when programming constructs exist to do things that programmers need to do, programmers should use such structures rather than nasty hacks. On the other hand, I do have a problem with the idea that a language should forbid hacks to do things which need to be done, and for which the language provides no non-hacky alternative.Sohn
WARNING: using [1] will result in bound violation reports when using gcc -mmpx -fcheck-pointer-bounds.Lura
@Candiecandied Why after 11 years have you still not revised or deleted this answer? As other comments have pointed out, using foo[1] instead of foo[] isn't just wrong in that it isn't portable, it is dangerously wrong, as it causes undefined behavior -- making compilers silently generate incorrect/unintended bytecode.Zenobia
@Will. Because I believe that, together witha all these comments, it is still instructional. I added a note to remind people to read the comments carefully.Candiecandied
@Candiecandied -- This answer is currently wrong and advocates for undefined behavior. You have added a note to read the comments, but this is insufficient. Answers should be complete in and of themselves; an answer should not rely on comments because comments are ephemeral. I am a little bit surprised that a comment thread this long hasn't already been moved to chat. Please revise the substance of your answer so that it is correct.Obscene
L
13

You meant...

struct header
{
 size_t len;
 unsigned char data[];
}; 

In C, that's a common idiom. I think many compilers also accept:

  unsigned char data[0];

Yes, it's dangerous, but then again, it's really no more dangerous than normal C arrays - i.e., VERY dangerous ;-) . Use it with care and only in circumstances where you truly need an array of unknown size. Make sure you malloc and free the memory correctly, using something like:-

  foo = malloc(sizeof(header) + N * sizeof(data[0]));
  foo->len = N;

An alternative is to make data just be a pointer to the elements. You can then realloc() data to the correct size as required.

  struct header
    {
     size_t len;
     unsigned char *data;
    }; 

Of course, if you were asking about C++, either of these would be bad practice. Then you'd typically use STL vectors instead.

Lemley answered 29/10, 2008 at 14:36 Comment(6)
provided that you are coding on a system where STL is supported!Lavallee
C++ but no STL... That's not a pleasant thought!Lemley
Name one compiler that accepts zero-length arrays. (If the answer was GCC, now name another.) It is not sanctioned by the C standard.Setaceous
I've worked in a C++ but no STL environment - we had our own containers which provided the commonly used functionality without the full generality of the STL iterator system. They were easier to understand and had good performance. However, this was in 2001.Frozen
@JonathanLeffler Accepted by GCC and Clang, which covers two out of the three main compilers in use today. (MSVC is the other big one, and that's only really relevant on one — admittedly very common — platform.)Improvise
@JonathanLeffler: Many compilers accepted the construct before the Standard broke it, since processing the construct was not only easier and more useful than processing C99-style flexible array members, but it was easier than gratuitously rejecting such useful constructs.Sohn
B
6

I've seen something like this: from C interface and implementation.

  struct header {
    size_t len;
    unsigned char *data;
};

   struct header *p;
   p = malloc(sizeof(*p) + len + 1 );
   p->data = (unsigned char*) (p + 1 );  // memory after p is mine! 

Note: data need not be last member.

Burkhalter answered 1/6, 2010 at 9:30 Comment(4)
Indeed this has the advantage that data need not be the last member, but it also incurs an extra dereference every time data is used. Flexible arrays replace that dereference with a constant offset from the main struct pointer, which is free on some particularly common machines and cheap elsewhere.Diphtheria
@R.. Although, considering the target address is necessarily the byte directly after the pointer, it is approximately 100% guaranteed to already be in L1 cache, giving the entire dereference something like half a cycle of overhead. However, the point stands that flexible arrays are a better idea here.Aleris
With unsigned char *, p->data = (unsigned char*) (p + 1 ) is OK. Yet with double complex *, p->data = (double complex *) (p + 1 ) may cause alignment problems.Baltoslavic
This answer is technically irrelevant, as it does something different (it lays out the data differently in memory). While the pattern it describes is often useful, that doesn't mean that it can be a replacement for the other.Improvise
G
5

As a side note, for C89 compatibility, such structure should be allocated like :

struct header *my_header
  = malloc(offsetof(struct header, data) + n * sizeof my_header->data);

Or with macros :

#define FLEXIBLE_SIZE SIZE_MAX /* or whatever maximum length for an array */
#define SIZEOF_FLEXIBLE(type, member, length) \
  ( offsetof(type, member) + (length) * sizeof ((type *)0)->member[0] )

struct header {
  size_t len;
  unsigned char data[FLEXIBLE_SIZE];
};

...

size_t n = 123;
struct header *my_header = malloc(SIZEOF_FLEXIBLE(struct header, data, n));

Setting FLEXIBLE_SIZE to SIZE_MAX almost ensures this will fail :

struct header *my_header = malloc(sizeof *my_header);
Gause answered 20/5, 2009 at 14:26 Comment(2)
Overly complex and there's no benefit over using [1] for C89 compatibility, if it's even needed...Diphtheria
Optimising compilers can correctly assume that an index into an array of length 1 must be zero. Kaboom!Anhedral
M
5

There are some downsides related to how structs are sometimes used, and it can be dangerous if you don't think through the implications.

For your example, if you start a function:

void test(void) {
  struct header;
  char *p = &header.data[0];

  ...
}

Then the results are undefined (since no storage was ever allocated for data). This is something that you will normally be aware of, but there are cases where C programmers are likely used to being able to use value semantics for structs, which breaks down in various other ways.

For instance, if I define:

struct header2 {
  int len;
  char data[MAXLEN]; /* MAXLEN some appropriately large number */
}

Then I can copy two instances simply by assignment, i.e.:

struct header2 inst1 = inst2;

Or if they are defined as pointers:

struct header2 *inst1 = *inst2;

This however won't work for flexible array members, since their content is not copied over. What you want is to dynamically malloc the size of the struct and copy over the array with memcpy or equivalent.

struct header3 {
  int len;
  char data[]; /* flexible array member */
}

Likewise, writing a function that accepts a struct header3 will not work, since arguments in function calls are, again, copied by value, and thus what you will get is likely only the first element of your flexible array member.

 void not_good ( struct header3 ) ;

This does not make it a bad idea to use, but you do have to keep in mind to always dynamically allocate these structures and only pass them around as pointers.

 void good ( struct header3 * ) ;
Misconstruction answered 3/3, 2014 at 9:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.