C / C++ MultiDimensional Array Internals
Asked Answered
W

4

25

I have a question about how C / C++ internally stores multidimensional arrays declared using the notation foo[m][n]. I am not questioning pure pointers to pointers etc... I am asking because of speed reasons...

Correct me if I am wrong, but syntactically foo is an array of pointers, which themselves point to an array

int foo[5][4]
*(foo + i)           // returns a memory address
*( *(foo + i) + j)    // returns an int

I have heard from many places that the C/C++ compiler converts foo[m][n] to a one dimensional array behind the scenes (calculating the required one dimension index with i * width + j). However if this was true then the following would hold

*(foo + 1)          // should return element foo[0][1]

Thus my question: Is it true that foo[m][n] is (always?) stored in memory as a flat one dimensional array?? If so, why does the above code work as shown.

Weintrob answered 16/10, 2011 at 14:3 Comment(2)
In case others have the same question, here is some further info: *foo == foo[0] == &foo[0][0] *(foo+1) == foo[1] == &foo[1][0] (int *)foo + 1 == &foo[0][1]Weintrob
No, foo is not an array of pointers; it's an array of arrays.Stickpin
G
29

Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element foo[0][1], the following code will work:

*((int *)foo+1)=5;

The explicit cast is necessary, because foo+1, is the same as &foo[1] which is not at all the same thing as foo[0][1]. *(foo+1) is a pointer to the fifth element in the flat memory area. In other words, *(foo+1) is basically foo[1] and **(foo+1) is foo[1][0]. Here is how the memory is laid out for some of your two dimensional array:

enter image description here

Gazo answered 16/10, 2011 at 14:7 Comment(9)
Yes I know that that will work, but IF foo was just a flat 1D array (of say ints), then you shouldn't be able to dereference twice - there shouldn't even be 2 pointers!!!!Weintrob
@Arrakis, but it isn't a flat 1D array to the compiler, it is a 2D array. It just so happens that the 2D array is laid out in memory in the same way as a 1D array of size product of the two dimensions would be.Gazo
There are two pointers because you asked for two by creating a multi-dimensional array. How it's stored in memory has no bearing on how it's referenced from the code.Lilienthal
Right, so in code I created an array of pointers to an array of ints. In memory, it's just a single array of ints. However this structure is completely hidden from the code perspective???Weintrob
Hidden until you do as @MichaelGoldshteyn said above, and explicitly cast as an int* instead of int **Lilienthal
No, there is not pointer to pointer, either stored in memory or as part of any valid expression here.Stickpin
No cast is necessary. foo[1][0] is *(*(foo + 1) + 0).Stickpin
@Keith Thompson, yes that is correct; however, that requires two dereferences while the cast requires just one. Of course, a good compiler will optimize yours to just one, but at -O0 this answer is more efficient than yours.Cherellecheremis
I take that back-- gcc version 4.9.2, even on -O0, compiles both of those code snippets to the exact same code (which, incidentally, uses more instructions than the code that just uses the normal brackets)Cherellecheremis
S
31

A two-dimensional array:

int foo[5][4];

is nothing more or less than an array of arrays:

typedef int row[4];   /* type "row" is an array of 4 ints */
row foo[5];           /* the object "foo" is an array of 5 rows */

There are no pointer objects here, either explicit or implicit.

Arrays are not pointers. Pointers are not arrays.

What often causes confusion is that an array expression is, in most contexts, implicitly converted to a pointer to its first element. (And a separate rule says that what looks like an array parameter declaration is really a pointer declaration, but that doesn't apply in this example.) An array object is an array object; declaring such an object does not create any pointer objects. Referring to an array object can create a pointer value (the address of the array's first element), but there is no pointer object stored in memory.

The array object foo is stored in memory as 5 contiguous elements, where each element is itself an array of 4 contiguous int elements; the whole thing is therefore stored as 20 contiguous int objects.

The indexing operator is defined in terms of pointer arithmetic; x[y] is equivalent to *(x + y). Typically the left operand is going to be either a pointer expression or an array expression; if it's an array expression, the array is implicitly converted to a pointer.

So foo[x][y] is equivalent to *(foo[x] + y), which in turn is equivalent to *(*(foo + x) + y). (Note that no casts are necessary.) Fortunately, you don't have to write it that way, and foo[x][y] is a lot easier to understand.

Note that you can create a data structure that can be accessed with the same foo[x][y] syntax, but where foo really is a pointer to pointer to int. (In that case, the prefix of each [] operator is already a pointer expression, and doesn't need to be converted.) But to do that, you'd have to declare foo as a pointer-to-pointer-to-int:

int **foo;

and then allocate and initialize all the necessary memory. This is more flexible than int foo[5][4], since you can determine the number of rows and the size (or even existence) of each row dynamically.

Section 6 of the comp.lang.c FAQ explains this very well.

EDIT:

In response to Arrakis's comment, it's important to keep in mind the distinction between type and representation.

For example, these two types:

struct pair { int x; int y;};
typedef int arr2[2];

very likely have the same representation in memory (two consecutive int objects), but the syntax to access the elements is quite different.

Similarly, the types int[5][4] and int[20] have the same memory layout (20 consecutive int objects), but the syntax to access the elements is different.

You can access foo[2][2] as ((int*)foo)[10] (treating the 2-dimensional array as if it were a 1-dimensional array). And sometimes it's useful to do so, but strictly speaking the behavior is undefined. You can likely get away with it because most C implementations don't do array bounds-checking. On the other hand, optimizing compilers can assume that your code's behavior is defined, and generate arbitrary code if it isn't.

Stickpin answered 16/10, 2011 at 15:4 Comment(10)
Thanks for the reply. My initial problem was in the fact that foo[5][4] in memory is a 1D array - not 2D. I was thus confused about why *(*(foo + 1) +2) would give a numeric value, as that is 2 dereferences (obviously invalid in a 1D array). My understanding now is that the code presents this array of array notation even though it's not what happens underneath. Casting to (int *) foo thus exposes the real memory structure of fooWeintrob
I'm confused about the same thing @Weintrob is. I understand the difference now. The distinction is slight, but highly dangerous if misunderstood. wLilienthal
I had the same question as @Arrakis, but I figured it out. Because it's a fixed array, the compiler knows the number of bytes between foo[0][0] and foo[1][0]. Likely in the same way it knows to add different values to a long* vs. an int*. From the link, I think this should work int* bar=foo[1], while it's very sensible that int **bar=foo should not. In almost all cases int[][] and int ** function the same way, but the differences are extremely important if you want to mix and match. This was a serious tangent, but you've seriously earned your +1 @KeithThompsonLilienthal
@Kendrick: I wouldn't say that int[][] and int** function the same way. The implicit decay of array expressions to pointers causes them to appear to behave similarly, but the underlying behavior is quite different.Stickpin
Sir I do not understand the last statement: You can probably get away with it because most C implementations don't do array bounds-checking. What if I forced compiler to check array bound?Hyponasty
@haccks: How would you "force" a compiler to check array bounds?Stickpin
Hmm. First tell me about behavior of this code #include <stdio.h> int main() { int a[5][5]; int *p; for(p = &a[0][0]; p <= &a[4][4]; p++) *p = 0; return 0; }. Is it cause UB?Hyponasty
@haccks: Yes, that has undefined beahvior. See N1570. The rule is stated in 6.5.6p8: "... otherwise, the behavior is undefined", confirmed in Annex J section 2: "An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).Stickpin
In the code I have given , I see that *p is evaluated in the array bound and it should not invoke UB ( what I understand by reading this in Annex J section 2: Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6). or I think I misunderstood something.Hyponasty
@haccks: The relevant array object is a single row of the 2D array. There is no 25-element array of int, only a 5-element array of int[5] elements.Stickpin
G
29

Yes, C/C++ stores a multi-dimensional (rectangular) array as a contiguous memory area. But, your syntax is incorrect. To modify element foo[0][1], the following code will work:

*((int *)foo+1)=5;

The explicit cast is necessary, because foo+1, is the same as &foo[1] which is not at all the same thing as foo[0][1]. *(foo+1) is a pointer to the fifth element in the flat memory area. In other words, *(foo+1) is basically foo[1] and **(foo+1) is foo[1][0]. Here is how the memory is laid out for some of your two dimensional array:

enter image description here

Gazo answered 16/10, 2011 at 14:7 Comment(9)
Yes I know that that will work, but IF foo was just a flat 1D array (of say ints), then you shouldn't be able to dereference twice - there shouldn't even be 2 pointers!!!!Weintrob
@Arrakis, but it isn't a flat 1D array to the compiler, it is a 2D array. It just so happens that the 2D array is laid out in memory in the same way as a 1D array of size product of the two dimensions would be.Gazo
There are two pointers because you asked for two by creating a multi-dimensional array. How it's stored in memory has no bearing on how it's referenced from the code.Lilienthal
Right, so in code I created an array of pointers to an array of ints. In memory, it's just a single array of ints. However this structure is completely hidden from the code perspective???Weintrob
Hidden until you do as @MichaelGoldshteyn said above, and explicitly cast as an int* instead of int **Lilienthal
No, there is not pointer to pointer, either stored in memory or as part of any valid expression here.Stickpin
No cast is necessary. foo[1][0] is *(*(foo + 1) + 0).Stickpin
@Keith Thompson, yes that is correct; however, that requires two dereferences while the cast requires just one. Of course, a good compiler will optimize yours to just one, but at -O0 this answer is more efficient than yours.Cherellecheremis
I take that back-- gcc version 4.9.2, even on -O0, compiles both of those code snippets to the exact same code (which, incidentally, uses more instructions than the code that just uses the normal brackets)Cherellecheremis
N
7

C arrays - even multi-dimensional ones - are contiguous, ie an array of type int [4][5] is structurally equivalent to an array of type int [20].

However, these types are still incompatible according to C language semantics. In particular, the following code is in violation of the C standard:

int foo[4][5] = { { 0 } };
int *p = &foo[0][0];
int x = p[12]; // undefined behaviour - can't treat foo as int [20]

The reason for this is that the C standard is (probably intentionally) worded in a way which makes bounds-checking implementations possible: As p is derived from foo[0], which has type int [5], valid indices must be in range 0..5 (resp. 0..4 if you actually access the element).

Many other programming languages (Java, Perl, Python, JavaScript, ...) use jagged arrays to implement multi-dimensional arrays. This is also possible in C by using an array of pointers:

int *bar[4] = { NULL };
bar[0] = (int [3]){ 0 };
bar[1] = (int [5]){ 1, 2, 3, 4 };
int y = bar[1][2]; // y == 3

However, jagged arrays are not contiguous, and the pointed-to arrays need not be of uniform size.

Because of implicit conversion of array expressions into pointer expressions, indexing jagged and non-jagged arrays looks identical, but the actual address calculations will be quite different:

&foo[1]    == (int (*)[5])((char *)&foo + 1 * sizeof (int [5]))

&bar[1]    == (int **)((char *)&bar + 1 * sizeof (int *))

&foo[1][2] == (int *)((char *)&foo[1] + 2 * sizeof (int))
           == (int *)((char *)&foo + 1 * sizeof (int [5]) + 2 * sizeof (int))

&bar[1][2] == (int *)((char *)bar[1] + 2 * sizeof (int)) // no & before bar!
           == (int *)((char *)*(int **)((char *)&bar + 1 * sizeof (int *))
                      + 2 * sizeof (int))
Narghile answered 16/10, 2011 at 21:25 Comment(0)
P
3

int foo[5][4];

foo is not an array of pointers; it's an array of arrays. Below image will help.

enter image description here

Paule answered 15/10, 2016 at 16:45 Comment(1)
this is what OP was asking aboutHube

© 2022 - 2024 — McMap. All rights reserved.