Can an N-dimensional array be serialized with a single C function?
Asked Answered
E

1

2

I've been accessing N-dimensional arrays with one-dimensional pointers in C for decades now. But now, reading this other SO question, I learnt that's UB.

I feel quite disappointed to read that, to be honest, even if it works the way you'd expect in every compiler, but it's UB anyway.

One area where I find this most worrying is when writing serialization functions.

For example, let's assume you have several float arrays such as:

float a[5];
float b[4][4];
float c[7][2][4][5];

According to the C specification, should I write three different functions if I want to serialize these three arrays?

The following function (which is the way I'd do this), is UB according to the question above:

void serializeNfloatarray(FILE *stream, float *ptr, size_t ndims, size_t *dims) {
   
   size_t numitems=1;
   
   if(ndims==0) return;

   fprintf(stream, "%zu ", ndims);
   
   for(size_t f=0; f<ndims; f++) {
      fprintf(stream, "%zu ", dims[f]);
      numitems *= dims[f];
   }

   for(size_t i=0; i<numitems; i++)
      fprintf(stream,"%f ", ptr[i]); /* <- UB !!! */
}

Is it impossible to write one C function valid for all types of float arrays without going into UB?

Eckart answered 31/10, 2021 at 8:25 Comment(3)
The highest-voted answer to the question you linked gives a way to access the array which is well-defined.Epstein
@interjay: No, it's not well defined, because it casts a char pointer back to an int pointer, and that's UB. You can do the access only with the char pointer, not the int pointer.Eckart
This is separate from the question, but I think the function definition accepting a float * forces you to explicitly cast to (float *) by the caller in case of multidimensional arrays. If this is what you want, that is fine. Otherwise, you probably have to change the argument to void * to hush the compiler, but that would obviously be slightly susceptible to human error.Phiona
M
3

The bytes representing any object may be accessed using the character types; the C standard defines the behavior of this. Therefore, the elements of any compound array may be accessed by copying their bytes. That can be done use your own code that copies bytes with a character type, or you can use memcpy, which is specified to copy bytes:

    for (size_t i = 0; i < numitems; i++)
    {
        //  Create a temporary object.
        float t;

        //  Copy bytes from the desired element into the intermediary object.
        memcpy(&t, (const unsigned char *) ptr + i * sizeof *t);

        //  Print the value.
        fprintf(stream, "%f ", t);
    }

Note that %f is insufficient to record the values of float numbers, as it does not guarantee enough digits. %a is designed for this.

Molina answered 31/10, 2021 at 8:56 Comment(5)
I do believe you can avoid this copying by just accepting a const void *arg argument and then implicitly cast this as const float *ptr and use ptr[i] like OP did. That is: I don't think the actual copying is necessary, just the pointing at the correct offset. So basically I think what OP did is fine, it's just that the warning could be avoided by the cast. I may be totally wrong though, in which case I would happily be corrected.Phiona
@Cheatah: For an object declared as, say, float A[10][20], the C standard does not define the behavior of accessing its elements via a single float *p. There are two reasons for this. One, pointer arithmetic is defined only within array bounds (including a notional end position). So, if p points to A[0][0], p+i is defined only for 0 ≤ i ≤ 20. The fact that A[0] is embedded inside a larger array does not affect this, because the clause defining pointer arithmetic makes no provision for it.Molina
Two, the standard does not clearly say an object of type float [10][20] may be aliased as an object of type float [200]. The rules on this are unclear, and the standard ought to be updated with clearer specifications of the aliasing rules. Nonetheless, the rules are what they are and do not tell us we can do this. In contrast, any object may be accessed as an array of characters, so the byte-by-byte copy is the way to do it. A good compiler will eliminate the memcpy in optimization and simply directly load the relevant data into a register to pass to fprintf.Molina
It's really unfortunate that 1)this copy is necessary, 2)every C programmer would write this without this copy, and 3)the C23 spec is now feature-freeze, with tons of new additions that come with new syntax constructions but that don't fix very obvious problems like this one (unless I read the docs too fast). I'm leaving the question open for a couple of days just in case somebody finds a standard-compliant way to avoid the copy, but I'm afraid it's unlikely...Eckart
@cesss: Re “every C programmer would write this without this copy”: No, not every C programmer would write this without this copy. There are people who pay attention to specifications and seek to engineer well-defined code. When writing code for maximum portability, they would write the copy. In other circumstances, they might the copy but document the requirement that the program be compiled with a compiler that supports reshaping arrays.Molina

© 2022 - 2024 — McMap. All rights reserved.