Is it good programming practice in C to use first array element as array length?

Asked 28/1, 2021 at 12:44 Answered 18/2, 2021 at 9:54

arrays c definition function-declaration

Because in C the array length has to be stated when the array is defined, would it be acceptable practice to use the first element as the length, e.g.

int arr[9]={9,0,1,2,3,4,5,6,7};

Then use a function such as this to process the array:

int printarr(int *ARR) {
    for (int i=1; i<ARR[0]; i++) {
        printf("%d ", ARR[i]);
    }
}

I can see no problem with this but would prefer to check with experienced C programmers first. I would be the only one using the code.

Kinney answered 28/1, 2021 at 12:44 Comment(5)

TLDR: No. What if the type of the array isn't numerical? – Diandiana 28/1, 2021 at 12:47

It really depends on the situation. It's hard to make a general statement. If you write code like this, it should at least be properly documented in comments. – Taco 28/1, 2021 at 12:49

If the array is a fixed size array like in your example, you could simply do this: #define MYSIZE 9 and then int arr[MYSIZE]={0,1,2,3,4,5,6,7};. Now you know that the size is MYSIZE. Otherways the size of a fixed size array is always sizeof(array) / sizeof(array[0]). Some implementations have a _countofmacro, if not, you can make your own based on above. – Taco 28/1, 2021 at 12:55

I like this question. It's a bad idea, but a good question. – Herald 28/1, 2021 at 13:16

Many thanks for all your excellent comments and advice. I will now use structures as recommended with the array, length and an added description. I think I was inclined to use a slang version of C by including metadata in arrays. – Kinney 30/1, 2021 at 13:43

Well, it's bad in the sense that you have an array where the elements does not mean the same thing. Storing metadata with the data is not a good thing. Just to extrapolate your idea a little bit. We could use the first element to denote the element size and then the second for the length. Try writing a function utilizing both ;)

It's also worth noting that with this method, you will have problems if the array is bigger than the maximum value an element can hold, which for char arrays is a very significant limitation. Sure, you can solve it by using the two first elements. And you can also use casts if you have floating point arrays. But I can guarantee you that you will run into hard traced bugs due to this. Among other things, endianness could cause a lot of issues.

And it would certainly confuse virtually every seasoned C programmer. This is not really a logical argument against the idea as such, but rather a pragmatic one. Even if this was a good idea (which it is not) you would have to have a long conversation with EVERY programmer who will have anything to do with your code.

A reasonable way of achieving the same thing is using a struct.

struct container {
    int *arr;
    size_t size;
};

int arr[10];

struct container c = { .arr = arr, .size = sizeof arr/sizeof *arr };

But in any situation where I would use something like above, I would probably NOT use arrays. I would use dynamic allocation instead:

const size_t size = 10;
int *arr = malloc(sizeof *arr * size);
if(!arr) { /* Error handling */ }

struct container c = { .arr = arr, .size = size };

However, do be aware that if you init it this way with a pointer instead of an array, you're in for "interesting" results.

You can also use flexible arrays, as Andreas wrote in his answer

Herald answered 28/1, 2021 at 12:47 Comment(9)

Shouldn't we always use sizeof with brackets sizeof(XXX) to be save? – Gum 28/1, 2021 at 13:57

@Gum It's a matter of taste. I have not really made up my mind about it yet. Using parenthesis makes it look like a function, but it's not. It's an operator. – Herald 28/1, 2021 at 13:58

I had lately some strange behavior. It might not work when you have a type defined with multiple tokens like unsigned int. – Gum 28/1, 2021 at 14:2

@Gum So far, I have NEVER seen a reason to use the type name instead of the variable name as an argument to sizeof. – Herald 28/1, 2021 at 14:6

@Gum Doing excessive things "just to be safe" is usually an indication of not knowing the specifics, i.e., missing a full understanding of the syntax. Types need parenthesis to be represented, expressions do not. Expressions follow operator precedence. There is not a "might not work" case. – Bertsche 28/1, 2021 at 14:36

I sometimes do a = malloc(sizeof (*a) * n) just because I think it looks iffy without parenthesis. – Herald 28/1, 2021 at 14:39

@Oka, I know that expressions follow operator precedence, but this precedence is not always the same. It depends on the compiler. Of course when it's about +- and */ it does always work the same but when it comes to binary operators for example that are not used all over the place, some compilers will behave differently (| and ^). I can imagine that sizeof a / sizeof b could be interpreted as sizeof(a / sizeof(b)) by some compilers. – Gum 28/1, 2021 at 15:7

@Gum A compiler that would interpret it as sizeof(a / sizeof(b)) is not conforming to the standard. – Herald 28/1, 2021 at 15:45

@Gum Do you have an example for when precedence is not the same? – Herald 29/1, 2021 at 4:50

In C you can use flexible array members. That is you can write

struct intarray {
   size_t count;
   int data[];  // flexible array member needs to be last
};

You allocate with

size_t count = 100;
struct intarray *arr = malloc( sizeof(struct intarray) + sizeof(int)*count );
arr->count = count;

That can be done for all types of data.

It makes the use of C-arrays a bit safer (not as safe as the C++ containers, but safer than plain C arrays).

Unforntunately, C++ does not support this idiom in the standard. Many C++ compilers provide it as extension though, but it is not guarantueed.

On the other hand this C FLA idiom may be more explicit and perhaps more efficient than C++ containers as it does not use an extra indirection and/or need two allocations (think of new vector<int>).

If you stick to C, I think this is a very explicit and readable way of handling variable length arrays with an integrated size.

The only drawback is that the C++ guys do not like it and prefer C++ containers.

Dogcatcher answered 28/1, 2021 at 13:2 Comment(5)

It would be good if you mentioned that a struct can have max 1 flexible array member and that it has to be last in the struct. – Herald 28/1, 2021 at 13:33

@Herald Thanks I added a comment – Dogcatcher 28/1, 2021 at 13:44

This is also a common idiom for Windows (with some compat caveats, also afaik 0-length arrays aren't legal in c99 either so careful there- but the general point stands). Apart from that I'm used to using offsetof() is the sizeof guaranteed to take care of padding? – Bipinnate 28/1, 2021 at 14:8

@Bipinnate sizeof "takes care of padding" - on the contrary - it is often too much. – Simms 28/1, 2021 at 14:28

maybe just use sizeof *arr + sizeof arr->data[0] * count :D – Simms 28/1, 2021 at 14:29

It is not bad (I mean it will not invoke undefined behavior or cause other portability issues) when the elements of array are integers, but instead of writing magic number 9 directly you should have it calculate the length of array to avoid typo.

#include <stdio.h>

int main(void) {
    int arr[9]={sizeof(arr)/sizeof(*arr),0,1,2,3,4,5,6,7};
    
    for (int i=1; i<arr[0]; i++) {
        printf("%d ", arr[i]);
    }
    return 0;
}

Sillimanite answered 28/1, 2021 at 12:46 Comment(1)

Well, it's a better way of doing a very bad thing :) – Herald 28/1, 2021 at 13:19

Only a few datatypes are suitable for that kind of hack. Therefore, I would advise against it, as this will lead to inconsistent implementation styles across different types of arrays.

Maggio answered 28/1, 2021 at 12:51 Comment(0)

A similar approach is used very often with character buffers where in the beginning of the buffer there is stored its actual length.

Dynamic memory allocation in C also uses this approach that is the allocated memory is prefixed with an integer that keeps the size of the allocated memory.

However in general with arrays this approach is not suitable. For example a character array can be much larger than the maximum positive value (127) that can be stored in an object of the type char. Moreover it is difficult to pass a sub-array of such an array to a function. Most of functions that designed to deal with arrays will not work in such a case.

A general approach to declare a function that deals with an array is to declare two parameters. The first one has a pointer type that specifies the initial element of an array or sub-array and the second one specifies the number of elements in the array or sub-array.

Also C allows to declare functions that accepts variable length arrays when their sizes can be specified at run-time.

Balenciaga answered 28/1, 2021 at 12:57 Comment(0)

It is suitable in rather limited circumstances. There are better solutions to the problem it solves.

One problem with it is that if it is not universally applied, then you would have a mix of arrays that used the convention and those that didn't - you have no way of telling if an array uses the convention or not. For arrays used to carry strings for example you have to continually pass &arr[1] in calls to the standard string library, or define a new string library that uses "Pascal strings" rather then "ASCIZ string" conventions (such a library would be more efficient as it happens),

In the case of a true array rather then simply a pointer to memory, sizeof(arr) / sizeof(*arr) will yield the number of elements without having to store it in the array in any case.

It only really works for integer type arrays and for char arrays would limit the length to rather short. It is not practical for arrays of other object types or data structures.

A better solution would be to use a structure:

typedef struct
{
    size_t length ;
    int* data ;
} intarray_t ;

Then:

int data[9] ;
intarray_t array{ sizeof(data) / sizeof(*data), data } ;

Now you have an array object that can be passed to functions and retain the size information and the data member can be accesses directly for use in third-party or standard library interfaces that do not accept the intarray_t. Moreover the type of the data member can be anything.

Thyself answered 28/1, 2021 at 13:0 Comment(1)

Just note that the flexible array member version is perhaps a more elegant solution when it comes to hosted systems, while this in this answer is the preferred one for embedded systems with no heap. – Dinner 28/1, 2021 at 13:36

Obviously NO is the answer. All programming languages has predefined functions stored along with the variable type. Why not use them?? In your case is more suitable to access count /length method instead of testing the first value.

An if clause sometimes take more time than a predefined function.

On the first look seems ok to store the counter but imagine you will have to update the array. You will have to do 2 operations, one to insert other to update the counter. So 2 operations means 2 variables to be changed. For statically arrays might be ok to have them counter then the list, but for dinamic ones NO NO NO. On the other hand please read programming basic concepts and you will find your idea as a bad one, not complying with programming principles.

Postaxial answered 18/2, 2021 at 9:54 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags