What kind of C11 data type is an array according to the AMD64 ABI
Asked Answered
R

1

2

I was researching the calling convention of x86_64 that's used on OSX and was reading the section called "Aggregates and Unions" in the System V x86-64 ABI standard). It mention arrays and I figured that was like a fixed length c array, e.g. int[5].

I went down to "3.2.3 Parameter Passing" to read about how arrays were passed and if I'm understanding correctly, something like uint8_t[3] should be passed in registers as it's smaller than the four eightbyte limit imposed by rule 1 of the classification of aggregate types (page 18 near the bottom).

After compiling I see that instead it's being passed as a pointer. (I'm compiling with clang-703.0.31 from Xcode 7.3.1 on OSX 10.11.6).

The example source I was using to compile is as follows:

#include <stdio.h>

#define type char

extern void doit(const type[3]);
extern void doitt(const type[5]);
extern void doittt(const type[16]);
extern void doitttt(const type[32]);
extern void doittttt(const type[40]);

int main(int argc, const char *argv[]) {
  const char a[3] = { 1, 2, 3 };
  const char b[5] = { 1, 2, 3, 4, 5 };
  const char c[16] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1 };
  const char d[32] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1 };
  const char e[40] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 };

  doit(a);
  doitt(b);
  doittt(c);
  doitttt(d);
  doittttt(e);
}

I dump that in a file named a.c and use the following command to compile: clang -c a.c -o a.o. I use otool to analyze the assembly generated (by running otool -tV a.o) and get the following output:

a.o:
(__TEXT,__text) section
_main:
0000000000000000    pushq   %rbp
0000000000000001    movq    %rsp, %rbp
0000000000000004    subq    $0x10, %rsp
0000000000000008    leaq    _main.a(%rip), %rax
000000000000000f    movl    %edi, -0x4(%rbp)
0000000000000012    movq    %rsi, -0x10(%rbp)
0000000000000016    movq    %rax, %rdi
0000000000000019    callq   _doit
000000000000001e    leaq    _main.b(%rip), %rdi
0000000000000025    callq   _doitt
000000000000002a    leaq    _main.c(%rip), %rdi
0000000000000031    callq   _doittt
0000000000000036    leaq    _main.d(%rip), %rdi
000000000000003d    callq   _doitttt
0000000000000042    leaq    _main.e(%rip), %rdi
0000000000000049    callq   _doittttt
000000000000004e    xorl    %eax, %eax
0000000000000050    addq    $0x10, %rsp
0000000000000054    popq    %rbp
0000000000000055    retq

Or equivalently, here it is on the Godbolt compiler explorer with clang3.7, which targets Linux which uses the same ABI.


So, I was wondering if anyone could lead me to what data types in C11 apply to arrays. (It looks like clang defaults to using C11 - see the blurb here right under C99 inline function).

I also did a similar investigation with ARM and found similar results, even though the ARM standard also specifies there exists an array aggregate type.

Also, is there somewhere in some standard that it's specified that a fixed length array is to be treated as a pointer?

Recrimination answered 6/8, 2016 at 2:33 Comment(13)
@PeterCordes: Arrays decay to pointers in most but not all contexts. More to the point, C does not permit parameters of array type. In a declaration like void func(int param[]);, the type of param is adjusted from int[] to int*. (This is a distinct rule from the one that specifies implicit conversion of array expressions to pointers.)Cruciate
@PeterCordes so is there no data type in c11 or c99 that is considered an array as defined in the x86_64/arm standards?Recrimination
@KeithThompson do you have a reference for the fact that C does not permit parameters of array type? I'd love to read more!Recrimination
@KeithThompson: ok, after a couple mis-steps with terminology, I think my answer mostly avoids any incorrect statements. Is "decay" the wrong word to describe the fact that array args to functions are always passed as pointers? I guess multi-dimensional arrays are still arrays.Dysart
@DanZimm: C certainly has arrays, and I presume they correspond to "arrays" as defined in the x64_64/arm standards. Reference: N1570 section 6.7.6.3 paragraph 7.Cruciate
@PeterCordes: "Decay" is a commonly used term, though the standard doesn't call it that; it merely says it's "converted". Note that decay is not specific to function arguments. An array expression is implicitly converted to a pointer in all but 3 contexts (sizeof, &, and a string literal in an initializer used to initialize an array object); argument passing just happens to be one of the contexts where the conversion does happen.Cruciate
@KeithThompson Do you have the reference about the pointer not being converted in those 3 contexts? I want to make sure I understand exactly what you mean (I understand the sizeof, but I'm interested in reading more about the &, string literal in an initializer). EDIT: Didnt see your response to me above - apologies. It looks like that fact you stated is just known because you're very familiar with the standard based on how 6.7.6.3 paragraph 7 is worded.Recrimination
@DanZimm: N1570 6.3.2.1 paragraph 3. (The reference to _Alignof is incorrect in that draft, corrected in the published ISO C11 standard.)Cruciate
@KeithThompson thanks for the references!! Fwiw it looks like array isn't used in the same sense in the ABIs I linked to as they're used in the C standard (in the ABIs an array isn't ever "converted" or treated as a pointer - it's always its own sort of data, either an aggregate or a composite depending which ABI you're looking at)Recrimination
@DanZimm: Arrays are not treated as pointers in C; arrays and pointers are two distinct kinds of types. Array expressions are sometimes converted to pointers -- but that's probably not something that an ABI needs to concern itself with.Cruciate
@KeithThompson I'm sorry I'm not being as precise with my terminology - arrays in the C standard are not directly equivalent to arrays in the ABI. In the C standard it's dealing with a kind of type, while the ABI is talking about kinds of data, or so it seems. Again, my terminology may not be as precise as it could be but I think the point I'm trying to make is clear: arrays as defined in the C standard don't map directly to arrays in the ABIsRecrimination
@KeithThompson eesh that last comment came off much more aggressively than I meant it to, sorry about that :/. Anyways, thanks for the references and information you provided!Recrimination
@DanZimm: No problem. Actually I just took a quick look at the ABI document you cited, and I think that its use of the word "array" is consistent with the way C (and C++) uses the term. C defines certain operations on arrays oddly, but in particular I think array objects are the same in C and in the ABI.Cruciate
D
8

Bare arrays as function args in C and C++ always decay to pointers, just like in several other contexts.

Arrays inside structs or unions don't, and are passed by value. This is why ABIs need to care about how they're passed, even though it doesn't happen in C for bare arrays.


As Keith Thomson points out, the relevant part of the C standard is N1570 section 6.7.6.3 paragraph 7

A declaration of a parameter as "array of type" shall be adjusted to "qualified pointer to type", where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation ... (stuff about foo[static 10], see below)

Note that multidimensional arrays work as arrays of array type, so only the outer-most level of "array-ness" is converted to a pointer to array type.


Terminology: The x86-64 ABI doc uses the same terminology as ARM, where structs and arrays are "aggregates" (multiple elements at sequential addresses). So the phrase "aggregates and unions" comes up a lot, because unions are handled similarly by the language and the ABI.

It's the recursive rule for handling composite types (struct/union/class) that brings the array-passing rules in the ABI into play. This is the only way you'll see asm that copies an array to the stack as part of a function arg, for C or C++

struct s { int a[8]; };
void ext(struct s byval);

void foo() { struct s tmp = {{0}}; ext(tmp); }

gcc6.1 compiles it (for the AMD64 SysV ABI, with -O3) to the following:

    sub     rsp, 40    # align the stack and leave room for `tmp` even though it's never stored?
    push    0
    push    0
    push    0
    push    0
    call    ext
    add     rsp, 72
    ret

In the x86-64 ABI, pass-by-value happens by actual copying (into registers or the stack), not by hidden pointers.

Note that return-by-value does pass a pointer as a "hidden" first arg (in rdi), when the return value is too large to fit in the 128bit concatenation of rdx:rax (and isn't a vector being returned in vector regs, etc. etc.)

It would be possible for the ABI to use a hidden pointer to pass-by-value objects above a certain size, and trust the called function not to modify the original, but that's not what the x86-64 ABI chooses to do. That would be better in some cases (especially for inefficient C++ with lots of copying without modification (i.e. wasted)), but worse in other cases.

SysV ABI bonus reading: As the tag wiki points out, the current version of the ABI standard doesn't fully document the behaviour that compilers rely on: clang/gcc sign/zero extend narrow args to 32bit.


Note that to really guarantee that a function arg is a fixed-size array, C99 and later lets you use the static keyword in a new way: on array sizes. (It's still passed as a pointer, of course. This doesn't change the ABI).

void bar(int arr[static 10]);

This allows compiler warnings about going out of bounds. It also potentially enables better optimization if the compiler knows it's allowed to access elements that the C source doesn't. (See this blog post). However, the arg still has type int*, not an actual array, so sizeof(arr) == sizeof(int*).

The same keyword page for C++ indicates that ISO C++ does not support this usage of static; it's another one of those C-only features, along with C99 variable-length-arrays and a few other goodies that C++ doesn't have.

In C++, you can use std::array<int,10> to get compile-time size information passed to the caller. However, you have to manually pass it by reference if that's what you want, since it's of course just a class containing an int arr[10]. Unlike a C-style array, it doesn't decay to T* automatically.


The ARM doc that you linked doesn't seem to actually call arrays an aggregate type: Section 4.3 Composite Types (which discusses alignment) distinguishes arrays from aggregate types, even though they appear to be a special case of its definition for aggregates.

A Composite Type is a collection of one or more Fundamental Data Types that are handled as a single entity at the procedure call level. A Composite Type can be any of:

  • An aggregate, where the members are laid out sequentially in memory
  • A union, where each of the members has the same address
  • An array, which is a repeated sequence of some other type (its base type).

The definitions are recursive; that is, each of the types may contain a Composite Type as a member

"Composite" is an umbrella term that includes arrays, structs, and unions.

Dysart answered 6/8, 2016 at 3:12 Comment(7)
The x86-84 ABI doc also calls arrays aggregates as well: The classification of aggregate (structures and arrays) and union types works as follows: so shouldn't it be that arrays (as define by the ABI) should be able to be passed by value as well? P.S. Thanks for the detailed answer, especially the piece on the static keyword - I never knew that!Recrimination
@DanZimm: crap, you're right. I didn't double check the terminology, since I was (and still am) 100% sure I'm correct about what actually happens to C array args: they're passed as pointers. Perhaps other languages allow pass-by-value of arrays? There's a Fortran section in the ABI doc.Dysart
Maybe Rust/Swift can take advantage of this? I don't have the clang source on this computer so I don't remember where, but passing arrays by value seems to be implemented in the underlying compiler (I think my terminology is wrong, but I saw code that properly maps an array onto register/into the stack by value instead of by pointer).Recrimination
@DanZimm: That would make sense for passing an array inside a struct. In that case, it's part of an aggregate that's passed by value according to C's rules, so it does get copied to regs / stack .Dysart
@DanZimm: Ok, thanks for catching my errors in terminology. I think I have this answer cleaned up into something that's useful and error-free. I'm open to suggestions for what else would be useful to say in this answer. (I guess a language spec citation for the array to pointer decay / conversion in function args would be good, but I don't really want to go dig that up. It's basically a well-known fact as far as I'm concerned.)Dysart
This: > This lets sizeof(arr) work as you might expect inside the called function could be interpreted to mean that sizeof(arr) == sizeof(int)*10 but it's still sizeof(int*) nothing else.Homeroom
@björnen: Thanks, fixed that bit of wishful thinking :PDysart

© 2022 - 2024 — McMap. All rights reserved.