How can I work with dynamically-allocated arbitrary-dimensional arrays?
Asked Answered
B

3

3

The typical 1-D array can be statically or automatically allocated in a declaration.

enum { n=100 };
int arr1[n];

Or dynamically-allocated and accessed via a pointer.

int *arr1m=malloc(n*sizeof*arr1m);
int *arr1c=calloc(n, sizeof*arr1c);

Both of these styles access an element with the same syntax.

int i = n/2;
arr1[i] = arr1c[i] = arr1m[i] = 42;

But when you add a second dimension, it takes some effort to achieve the same syntax.

int arr2[n][n];
int *arr2c=calloc(n*n,sizeof*arr2c);
arr2[5][5] = arr2c[5*n+5] = 23;

You can only get the doubled set of brackets if instead you construct it as an Iliffe-vector.

int **arr2l=calloc(n,sizeof*arr2l);
for (int j=0; j<n; j++)
    arr2l[j]=calloc(n,sizeof**arr2l);
 arr2[6][6] = arr2l[6][6] = 72;

But this becomes more and more cumbersome as the dimensions increase.

Another difficulty is checking the bounds of a dynamic array before accessing an element (so that you're not touching memory that was not properly allocated). The real arrays can use the sizeof operator to determine the bounds, but none of these dynamic arrays carry their size with them.

How can I define a structure that has fast, contiguous layout like an array, but with a consistent syntax for accessing elements with a list of indices that works the same for a 2D array as for a 3D array; and all dynamically, with sizes available dynamically so they can be passed to and returned from functions?

Badtempered answered 4/5, 2015 at 6:30 Comment(8)
This material was edited with assistance from the fine fellows in comp.lang.c.Badtempered
Please, please, don't use alloca, this belongs in the history books. This allocates on the "stack", so there is no advantage at all to prefer that over variable length arrays (VLA) that are the standardized approach.Melodiemelodion
@JensGustedt Ok. I've removed it. Don't want to give nobody ideas.Badtempered
@JensGustedt: VLAs are optional as of C11. There could be a C implementation that supports alloca but not VLAs. (That doesn't necessarily mean I disagree with your point.) Note that both VLAs and alloca share the flaw that they don't report allocation failures.Patrizius
@KeithThompson, yes, but I don't know of any implementation of C11 without VLA, and I suspect that there never will be one.Melodiemelodion
@JensGustedt: Then why did the committee make them optional? I presume there was some pressure to do so.Patrizius
@KeithThompson, I think the idea was to get the last major stone-age C compiler implementor on board. But that failed.Melodiemelodion
This material was originally intended as a response to this question which was closed when I attempted to post. (open now)Badtempered
B
2

There is a data structure used in the implementation of the J language (a dialect of APL) which accommodates dynamically-allocated arbitrary-dimensional arrays. It uses a hybrid of a struct and a dynamic array for its data structure, a trick commonly known as the struct hack. (More info on the J implementation here and here.)

To see the idea in a simple context, consider the 1D case: we want a dynamic 1D array which carries its size along with it. So:

struct vec { int n; int p[]; };

Since the p member is last, and C has no built-in bounds checking, it can be used to access additional memory off the end of the struct. Of course, when allocating, we'll need to provide this extra memory and not simply allocate the size of the struct. The struct is just the header of the array. C90 requires a number (say, 1) for the length of the p[] array, but C99 allows the number to be omitted so the size of the header is more straightforward to calculate.

So, an array with more dimensions will need more values to hold the sizes of each dimension. And for our structure to accommodate arrays of different dimensionality, this dimension vector will need to be variable-length as well.

What we can do to achieve all of this is to apply the struct hack twice, recursively upon itself. This gives us a memory layout like this, where R is the number of dimensions which we'll call the rank of the array, the D values are the lengths of each dimension, and the V values are the actual array data:

  1   R                    Product(D) 
 --- -------------------- ----------------------------- 
  R  D[0] D[1] ... D[R-1] V[0] V[1] ... V[Product(D)-1] 

And to describe this in C,

typedef struct arr { int r; int d[]; } *arr;

The elements of an array a immediately follow the R elements of the dims vector D. So the V elements can be accessed at a->d[r+0], a->d[r+1], ... a->d[r+i] (after reducing the index vector down to a single index on the flattened representation). The elements are easiest to treat in row-major order. The number of actual elements is the product of all the dimensions, all the dimensions multiplied together. Edit: The expression here might be better written: (a->d+a->r)[0], (a->d+a->r)[1], ... (a->d+a->r)[i].

In order to allocate one of these things, we'll need a function to compute this product as part of the size calculation.

int productdims(int rank, int *dims){
    int z=1;
    for(int i=0; i<rank; i++)
        z *= dims[i];
    return z;
}

And to initialize, we just need to fill-in the members.

arr makearr(int rank, int *dims){
    arr z = calloc( (sizeof(struct arr)/sizeof(int)) + 
                rank + productdims(rank,dims), sizeof(int));
    z->r = rank;
    memmove(z->d,dims,rank*sizeof(int));
    return z;
}

Remembering the formula for accessing 2D data (say an array of [m][n] elements) with a single index (it's a typical dynamic array like in the question). Element [i][j] is at i×n+j. With a 3D array [m][n][o], element [i][j][k] is at i×(n×o)+j×o+k.

So we can calculate a single index for our linearly-laid-out data from an array of indices and the array of dimensions.

int *elem(arr a, ...){
    va_list ap;
    int idx = 0;

    va_start(ap,a);
    if (a->r){
        idx = va_arg(ap,int);
        for(int i=1; i<a->r; i++){
            idx *= a->d[i];
            idx += va_arg(ap,int);
        }
    }
    va_end(ap);

    return &a->d[a->r + idx];
}

Don't forget the headers:

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

And voilà:

int main() {    

    {
        int i,n=6;
        arr m = makearr(1, (int[]){n});
        for (i=0;i<n;i++)
            *elem(m,i) = i;
        for (i=0;i<n;i++,printf(" "))
            printf("%d",*elem(m,i));
    }
    puts("\n");

    {
        int i,j,n=4;
        arr m = makearr(2, (int[]){n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                *elem(m,i,j) = i*n+j;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf(" "))
                printf("%d",*elem(m,i,j));
    }
    puts("\n");

    {
        int i,j,k,n=3;
        arr m = makearr(3, (int[]){n,n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                for (k=0;k<n;k++)
                    *elem(m,i,j,k) = (i*n+j)*n+k;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf("\n"))
                for (k=0;k<n;k++,printf(" "))
                    printf("%d",*elem(m,i,j,k));
    }

    return 0;
}

Output:

0 1 2 3 4 5 

0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 


0 1 2 
3 4 5 
6 7 8 

9 10 11 
12 13 14 
15 16 17 

18 19 20 
21 22 23 
24 25 26 

Further elaboration of this code to support 2D matrix multiplication and column slices has been posted to comp.lang.c in this thread.

Better-motivated question/more powerful data structure.

Badtempered answered 4/5, 2015 at 6:30 Comment(8)
Since C99 and C11 have flexible array members, you don't need the struct hack in the same way as you used to.Aggrandize
True. Would it be clearer to remove the 1 do you think? I use C99 compound literals in main, but I thought the other code might be more useful if were C90 compatible.Badtempered
It depends on the extent to which you want to kowtow to the people who have to suffer with Microsoft compilers on Windows. MSVC doesn't support C99, let alone C11, but it is basically the lone holdout.Aggrandize
It is not only, that this is standardized C99, so preferable for portability. It is also that the "struct hack" may just get things wrong: depending on the number that you put there, padding inside the structure may be different from compiler version to compiler version.Melodiemelodion
I fail to see the point of this structure. You still need a function to allocate and a function to access an element. How is this better than a struct matrix { int *p; int m, int n} with alloc_matrix(...) and elem_matrix(struct matrix *mat, int i, int j)?Marilyn
@Marilyn This structure extends for arbitrary n. 4D arrays, 5D arrays, etc.Badtempered
@JonathanLeffler Oops! I thought about VLAs, but was uncertain how they interact with stdarg. I rejected alloca on similar uncertainty.Badtempered
I don't think VLAs and stdarg have any interaction; they're independent. You'd be able to write int ind[a->r]; to define the VLA without any issues. Incidentally, a standard C11 compiler without support for VLAs defines __STDC_NO_VLA__ as 1 to indicate that absence of that support (§6.10.8.3).Aggrandize
M
9

There is no need to reinvent the wheel, C has that since C99, it is called variable length array, VLA. It has just the syntax as "normal" d-dimensional arrays, only that the bounds may be variable and they are not allowed in file scope.

As such objects can become relatively large, you should not allocate them on the stack, but with something like malloc

double (*A)[n][m] = malloc(sizeof(double[k][n][m]));

The compiler then helps you with all the index calculations without problems. If you want to pass such animals to functions you just have to be careful to declare the bounds first:

void func(size_t k, size_t n, size_t m, double A[k][n][m]);

This makes your intent clear to both, the human reader and the compiler. I prefer this over the equivalent form

void func(size_t k, size_t n, size_t m, double (*A)[n][m]);
Melodiemelodion answered 4/5, 2015 at 7:22 Comment(0)
B
2

There is a data structure used in the implementation of the J language (a dialect of APL) which accommodates dynamically-allocated arbitrary-dimensional arrays. It uses a hybrid of a struct and a dynamic array for its data structure, a trick commonly known as the struct hack. (More info on the J implementation here and here.)

To see the idea in a simple context, consider the 1D case: we want a dynamic 1D array which carries its size along with it. So:

struct vec { int n; int p[]; };

Since the p member is last, and C has no built-in bounds checking, it can be used to access additional memory off the end of the struct. Of course, when allocating, we'll need to provide this extra memory and not simply allocate the size of the struct. The struct is just the header of the array. C90 requires a number (say, 1) for the length of the p[] array, but C99 allows the number to be omitted so the size of the header is more straightforward to calculate.

So, an array with more dimensions will need more values to hold the sizes of each dimension. And for our structure to accommodate arrays of different dimensionality, this dimension vector will need to be variable-length as well.

What we can do to achieve all of this is to apply the struct hack twice, recursively upon itself. This gives us a memory layout like this, where R is the number of dimensions which we'll call the rank of the array, the D values are the lengths of each dimension, and the V values are the actual array data:

  1   R                    Product(D) 
 --- -------------------- ----------------------------- 
  R  D[0] D[1] ... D[R-1] V[0] V[1] ... V[Product(D)-1] 

And to describe this in C,

typedef struct arr { int r; int d[]; } *arr;

The elements of an array a immediately follow the R elements of the dims vector D. So the V elements can be accessed at a->d[r+0], a->d[r+1], ... a->d[r+i] (after reducing the index vector down to a single index on the flattened representation). The elements are easiest to treat in row-major order. The number of actual elements is the product of all the dimensions, all the dimensions multiplied together. Edit: The expression here might be better written: (a->d+a->r)[0], (a->d+a->r)[1], ... (a->d+a->r)[i].

In order to allocate one of these things, we'll need a function to compute this product as part of the size calculation.

int productdims(int rank, int *dims){
    int z=1;
    for(int i=0; i<rank; i++)
        z *= dims[i];
    return z;
}

And to initialize, we just need to fill-in the members.

arr makearr(int rank, int *dims){
    arr z = calloc( (sizeof(struct arr)/sizeof(int)) + 
                rank + productdims(rank,dims), sizeof(int));
    z->r = rank;
    memmove(z->d,dims,rank*sizeof(int));
    return z;
}

Remembering the formula for accessing 2D data (say an array of [m][n] elements) with a single index (it's a typical dynamic array like in the question). Element [i][j] is at i×n+j. With a 3D array [m][n][o], element [i][j][k] is at i×(n×o)+j×o+k.

So we can calculate a single index for our linearly-laid-out data from an array of indices and the array of dimensions.

int *elem(arr a, ...){
    va_list ap;
    int idx = 0;

    va_start(ap,a);
    if (a->r){
        idx = va_arg(ap,int);
        for(int i=1; i<a->r; i++){
            idx *= a->d[i];
            idx += va_arg(ap,int);
        }
    }
    va_end(ap);

    return &a->d[a->r + idx];
}

Don't forget the headers:

#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

And voilà:

int main() {    

    {
        int i,n=6;
        arr m = makearr(1, (int[]){n});
        for (i=0;i<n;i++)
            *elem(m,i) = i;
        for (i=0;i<n;i++,printf(" "))
            printf("%d",*elem(m,i));
    }
    puts("\n");

    {
        int i,j,n=4;
        arr m = makearr(2, (int[]){n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                *elem(m,i,j) = i*n+j;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf(" "))
                printf("%d",*elem(m,i,j));
    }
    puts("\n");

    {
        int i,j,k,n=3;
        arr m = makearr(3, (int[]){n,n,n});
        for (i=0;i<n;i++)
            for (j=0;j<n;j++)
                for (k=0;k<n;k++)
                    *elem(m,i,j,k) = (i*n+j)*n+k;
        for (i=0;i<n;i++,printf("\n"))
            for (j=0;j<n;j++,printf("\n"))
                for (k=0;k<n;k++,printf(" "))
                    printf("%d",*elem(m,i,j,k));
    }

    return 0;
}

Output:

0 1 2 3 4 5 

0 1 2 3 
4 5 6 7 
8 9 10 11 
12 13 14 15 


0 1 2 
3 4 5 
6 7 8 

9 10 11 
12 13 14 
15 16 17 

18 19 20 
21 22 23 
24 25 26 

Further elaboration of this code to support 2D matrix multiplication and column slices has been posted to comp.lang.c in this thread.

Better-motivated question/more powerful data structure.

Badtempered answered 4/5, 2015 at 6:30 Comment(8)
Since C99 and C11 have flexible array members, you don't need the struct hack in the same way as you used to.Aggrandize
True. Would it be clearer to remove the 1 do you think? I use C99 compound literals in main, but I thought the other code might be more useful if were C90 compatible.Badtempered
It depends on the extent to which you want to kowtow to the people who have to suffer with Microsoft compilers on Windows. MSVC doesn't support C99, let alone C11, but it is basically the lone holdout.Aggrandize
It is not only, that this is standardized C99, so preferable for portability. It is also that the "struct hack" may just get things wrong: depending on the number that you put there, padding inside the structure may be different from compiler version to compiler version.Melodiemelodion
I fail to see the point of this structure. You still need a function to allocate and a function to access an element. How is this better than a struct matrix { int *p; int m, int n} with alloc_matrix(...) and elem_matrix(struct matrix *mat, int i, int j)?Marilyn
@Marilyn This structure extends for arbitrary n. 4D arrays, 5D arrays, etc.Badtempered
@JonathanLeffler Oops! I thought about VLAs, but was uncertain how they interact with stdarg. I rejected alloca on similar uncertainty.Badtempered
I don't think VLAs and stdarg have any interaction; they're independent. You'd be able to write int ind[a->r]; to define the VLA without any issues. Incidentally, a standard C11 compiler without support for VLAs defines __STDC_NO_VLA__ as 1 to indicate that absence of that support (§6.10.8.3).Aggrandize
R
2

If you define a as a pointer to an array of n integers, the compiler will do the index arithmetic.

#define N 7
int (*a)[N];

int main() {
  a = malloc(N*N*sizeof(int));
  a[2][3] = 0;
}

ADDED:

Similarly, a three dimensional example:

#include <stdio.h>
#include <stdlib.h>

#define N 7

int (*a)[N][N];

int main() {
    int i,j,k;
    a = malloc(N*N*N*sizeof(int));
    for(i=0; i<N; i++) {
        for(j=0;j<N;j++) {
            for(k=0;k<N;k++) {
                a[i][j][k] = (i*10+j)*10+k;
            }
        }
    }
    for(i=0; i<N; i++) {
        for(j=0;j<N;j++) {
            for(k=0;k<N;k++) {
                printf("%2d ", a[i][j][k]);
            }
            printf("\n");
        }
        printf("\n");
    }
}
Ruble answered 4/5, 2015 at 6:47 Comment(3)
Hm. But how does it look for 3D?Badtempered
The allocation can even look simpler, I think. E.g malloc(sizeof(int[n][n][n])).Melodiemelodion
Hmm. That's not so bad after all. I think encapsulating the sizes will be useful for writing functions that operate on these arrays, but my examples do not show this. +1 showed me something new.Badtempered

© 2022 - 2024 — McMap. All rights reserved.