What's the need of array with zero elements?
Asked Answered
H

5

131

In the Linux kernel code I found the following thing which I can not understand.

 struct bts_action {
         u16 type;
         u16 size;
         u8 data[0];
 } __attribute__ ((packed));

The code is here: http://lxr.free-electrons.com/source/include/linux/ti_wilink_st.h

What's the need and purpose of an array of data with zero elements?

Huberthuberto answered 1/2, 2013 at 9:42 Comment(8)
I'm not sure if there should be either a zero-length-arrays or struct-hack tag ...Teasley
@hippietrail, because often when someone asks what this struct is, they don't know that it is referred to as "flexible array member". If they did, they could have easily found their answer. Since they don't, they can't tag the question as such. That is why we don't have such a tag.Intitule
Well that could be a reason for not having tags for all concepts that some people don't know the terminology for ...Teasley
Vote to reopen. I agree that this was not a duplicate, because none of the other posts addresses the combination of a non-standard "struct hack" with zero length and the well-defined C99 feature flexible array member. I also think it is always of benefit for the C programming community to shed some light on any obscure code from the Linux kernel. Mainly since many people have the impression that the Linux kernel is some sort of state of the art C code, for reasons unknown. While in reality it is a terrible mess flooded with non-standard exploits that never should be regarded as some C canon.Congratulate
Not a duplicate - isn't the first time I've seen someone close a question unnecessarily. Also I think this question adds to the SO Knowledge base.Stoneblind
Also explained in question 2.6 of the comp.lang.c FAQ.Kelci
Possible duplicate of What happens if I define a 0-size array in C/C++?Breast
@Aniket if a question cleanly redirects to another one, there is no loss of "knowledge"... it only reduces repetition for the cost of an extra click.Nady
I
151

This is a way to have variable sizes of data, without having to call malloc (kmalloc in this case) twice. You would use it like this:

struct bts_action *var = kmalloc(sizeof(*var) + extra, GFP_KERNEL);

This used to be not standard and was considered a hack (as Aniket said), but it was standardized in C99. The standard format for it now is:

struct bts_action {
     u16 type;
     u16 size;
     u8 data[];
} __attribute__ ((packed)); /* Note: the __attribute__ is irrelevant here */

Note that you don't mention any size for the data field. Note also that this special variable can only come at the end of the struct.


In C99, this matter is explained in 6.7.2.1.16 (emphasis mine):

As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.

Or in other words, if you have:

struct something
{
    /* other variables */
    char data[];
}

struct something *var = malloc(sizeof(*var) + extra);

You can access var->data with indices in [0, extra). Note that sizeof(struct something) will only give the size accounting for the other variables, i.e. gives data a size of 0.


It may be interesting also to note how the standard actually gives examples of mallocing such a construct (6.7.2.1.17):

struct s { int n; double d[]; };

int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

Another interesting note by the standard in the same location is (emphasis mine):

assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:

struct { int n; double d[m]; } *p;

(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

Intitule answered 1/2, 2013 at 9:49 Comment(3)
To be clear, the original code in the question is still not standard in C99 (nor C11), and would still be considered a hack. The C99 standardization must omit the array bound.Dixiedixieland
What's [0, extra)?Laconic
@JL2210, en.wikipedia.org/wiki/Interval_(mathematics)#TerminologyIntitule
S
38

This is a hack actually, for GCC (C90) in fact.

It's also called a struct hack.

So the next time, I would say:

struct bts_action *bts = malloc(sizeof(struct bts_action) + sizeof(char)*100);

It will be equivalent to saying:

struct bts_action{
    u16 type;
    u16 size;
    u8 data[100];
};

And I can create any number of such struct objects.

Stoneblind answered 1/2, 2013 at 9:45 Comment(0)
S
8

The idea is to allow for a variable-sized array at the end of the struct. Presumably, bts_action is some data packet with a fixed-size header (the type and size fields), and variable-size data member. By declaring it as a 0-length array, it can be indexed just as any other array. You'd then allocate a bts_action struct, of say 1024-byte data size, like so:

size_t size = 1024;
struct bts_action* action = (struct bts_action*)malloc(sizeof(struct bts_action) + size);

See also: http://c2.com/cgi/wiki?StructHack

Spavin answered 1/2, 2013 at 9:48 Comment(3)
@Aniket: I'm not entirely sure from whence comes that idea.Spavin
in C++ yes, in C, not needed.Steakhouse
@sheu, it comes from the fact that your style of writing malloc makes you repeat yourself multiple times and if ever the type of action changes, you have to fix it multiple times. Compare the following two for yourself and you will know: struct some_thing *variable = (struct some_thing *)malloc(10 * sizeof(struct some_thing)); vs. struct some_thing *variable = malloc(10 * sizeof(*variable)); The second one is shorter, cleaner and clearly easier to change.Intitule
C
7

The code is not valid C (see this). The Linux kernel is, for obvious reasons, not in the slightest concerned with portability, so it uses plenty of non-standard code.

What they are doing is a GCC non-standard extention with array size 0. A standard compliant program would have written u8 data[]; and it would have meant the very same thing. The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself.

In older C standards, ending a struct with an empty array was known as "the struct hack". Others have already explained its purpose in other answers. The struct hack, in the C90 standard, was undefined behavior and could cause crashes, mainly since a C compiler is free to add any number of padding bytes at the end of the struct. Such padding bytes may collide with the data you tried to "hack" in at the end of the struct.

GCC early on made a non-standard extension to change this from undefined to well-defined behavior. The C99 standard then adapted this concept and any modern C program can therefore use this feature without risk. It is known as flexible array member in C99/C11.

Congratulate answered 1/2, 2013 at 13:27 Comment(9)
I doubt that "the linux kernel is not concerned with portability". Perhaps you meant portability to other compilers? It's true that it is quite entwined with features of gcc.Intitule
Nevertheless, I think this particular piece of code is not a mainstream code and is probably left out because its author didn't pay much attention to it. The license says its about some texas instruments drivers, so it's unlikely the core programmers of the kernel paid any attention to it. I'm pretty sure the kernel developers are constantly updating old code according to new standards or new optimizations. It's just too big to make sure everything is updated!Intitule
@Intitule With the "obvious" part, I meant portability to other operative systems, which naturally wouldn't make any sense. But they don't seem to give a damn about portability to other compilers either, they have used so many GCC extensions that Linux will not likely ever get ported to another compiler.Congratulate
@Intitule As for the case of anything labelled Texas Instruments, TI themselves are notorious for producing the most useless, crappy, naive C code ever seen, in their app notes for various TI chips. If the code originates from TI, then all bets regarding the chance of interpreting something useful from it are off.Congratulate
It's true that linux and gcc are inseparable. The Linux kernel is also quite hard to understand (mostly because an OS is complicated anyway). My point though, was that it's not nice to say "The authors of the Linux kernel apparently love to make things needlessly complicated and non-standard, if an option to do so reveals itself" due to a third-party-ish bad coding practice.Intitule
@Intitule Fair enough, I will certainly admit that I have quite limited knowledge about the inner goo of the Linux kernel. Though I have seen the zero size array appear in Linux kernel code before. (Ironically, they also like to write compile-time static assert macros that create typedefs containing a zero size array. How one of them will work while the other at the same time produces a compiler error is beyond me though. )Congratulate
let us continue this discussion in chatIntitule
A flexible-array member is not quite the same thing as a size-zero array. Given struct FLEX { int size; char dat[0];}; one could declare a variable of type FLEX and set size to zero, or one could declare struct { struct FLEX header; char dat[1234];} foo; and create a non-dynamic object of type FLEX with attached data. Flexible array members do not allow either usage.Bragi
Update re the portability discussion: Linux is quite close to being compiled with clang!Intitule
G
2

Another usage of zero length array is as a named label inside a struct to assist compile time struct offset check.

Suppose you have some large struct definitions (spans multiple cache lines) that you want to make sure they are aligned to cache line boundary both in the beginning and in the middle where it crosses the boundary.

struct example_large_s
{
    u32 first; // align to CL
    u32 data;
    ....
    u64 *second;  // align to second CL after the first one
    ....
};

In code you can declare them using GCC extensions like:

__attribute__((aligned(CACHE_LINE_BYTES)))

But you still want to make sure this is enforced in runtime.

ASSERT (offsetof (example_large_s, first) == 0);
ASSERT (offsetof (example_large_s, second) == CACHE_LINE_BYTES);

This would work for a single struct, but it would be hard to cover many structs, each has different member name to be aligned. You would most likely get code like below where you have to find names of the first member of each struct:

assert (offsetof (one_struct,     <name_of_first_member>) == 0);
assert (offsetof (one_struct,     <name_of_second_member>) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, <name_of_first_member>) == 0);
assert (offsetof (another_struct, <name_of_second_member>) == CACHE_LINE_BYTES);

Instead of going this way, you can declare a zero length array in the struct acting as a named label with a consistent name but does not consume any space.

#define CACHE_LINE_ALIGN_MARK(mark) u8 mark[0] __attribute__((aligned(CACHE_LINE_BYTES)))
struct example_large_s
{
    CACHE_LINE_ALIGN_MARK (cacheline0);
    u32 first; // align to CL
    u32 data;
    ....
    CACHE_LINE_ALIGN_MARK (cacheline1);
    u64 *second;  // align to second CL after the first one
    ....
};

Then the runtime assertion code would be much easier to maintain:

assert (offsetof (one_struct,     cacheline0) == 0);
assert (offsetof (one_struct,     cacheline1) == CACHE_LINE_BYTES);
assert (offsetof (another_struct, cacheline0) == 0);
assert (offsetof (another_struct, cacheline1) == CACHE_LINE_BYTES);
Gelatinize answered 15/9, 2016 at 17:53 Comment(1)
Interesting idea. Just a note that 0-length arrays are not allowed by the standard, so this is a compiler-specific thing. Also, it might be a good idea to quote gcc's definition of the behavior of 0-length arrays in a struct definition, in the very least to show whether it could introduce padding before or after the declaration.Intitule

© 2022 - 2024 — McMap. All rights reserved.