Why are the addresses of argc and argv 12 bytes apart?
Asked Answered
E

2

41

I ran the following program on my computer (64-bit Intel running Linux).

#include <stdio.h>

void test(int argc, char **argv) {
    printf("[test] Argc Pointer: %p\n", &argc);
    printf("[test] Argv Pointer: %p\n", &argv);
}

int main(int argc, char **argv) {
    printf("Argc Pointer: %p\n", &argc);
    printf("Argv Pointer: %p\n", &argv);
    printf("Size of &argc: %lu\n", sizeof (&argc));
    printf("Size of &argv: %lu\n", sizeof (&argv));
    test(argc, argv);
    return 0;
}

The output of the program was

$ gcc size.c -o size
$ ./size
Argc Pointer: 0x7fffd7000e4c
Argv Pointer: 0x7fffd7000e40
Size of &argc: 8
Size of &argv: 8
[test] Argc Pointer: 0x7fffd7000e2c
[test] Argv Pointer: 0x7fffd7000e20

The size of the pointer &argv is 8 bytes. I expected the address of argc to be address of (argv) + sizeof (argv) = 0x7ffed1a4c9f0 + 0x8 = 0x7ffed1a4c9f8 but there is a 4 byte padding in between them. Why is this the case?

My guess is that it could be due to memory alignment, but I am not sure.

I notice the same behaviour with the functions I call as well.

Edik answered 8/2, 2020 at 15:34 Comment(6)
Why not? They could be 174 bytes apart. An answer will depend on your operating system and/or a wrapper library that does setup for main.Cession
@aschepler: It should not depend on any wrapper that does setup for main. In C, main can be called as a regular function, so it needs to receive arguments like a regular function and must obey the ABI.Betoken
@aschelper: I notice the same behaviour for other functions as well.Edik
It's an interesting 'thought experiment', but really, there is nothing that should be more than a 'I wonder why'. These addresses can change depending on the os, compiler, compiler version, processor architecture and in no way should be depended upon in 'real life'.Foetation
the result of sizeof must be printed using %zuPearcy
@phuclv: For small objects, one could also cast the result of sizeof to unsigned and then format with %u.Filature
B
61

On your system, the first few integer or pointer arguments are passed in registers and have no addresses. When you take their addresses with &argc or &argv, the compiler has to fabricate addresses by writing the register contents to stack locations and giving you the addresses of those stack locations. In doing so, the compiler chooses, in a sense, whatever stack locations happen to be convenient for it.

Betoken answered 8/2, 2020 at 15:53 Comment(1)
Note that this could happen even if they are passed on the stack; the compiler has no obligation to use the incoming-value slot on the stack as the storage for the local objects the values go into. It might make sense to do this is the function is eventually going to tail-call and needs the current values of these objects to produce the outgoing arguments for the tail-call.Apothecium
L
11

Why are the addresses of argc and argv 12 bytes apart?

From the perspective of the language standard, the answer is "no particular reason". C does not specify or imply any relationship between the addresses of function parameters. @EricPostpischil describes what is probably happening in your particular implementation, but those details would be different for an implementation in which all arguments are passed on the stack, and that is not the only alternative.

Moreover, I'm having trouble coming up with a way in which such information could be useful within a program. For example, even if you "know" that the address of argv is 12 bytes before the address of argc, there's still no defined way to compute one of those pointers from the other.

Laurentian answered 8/2, 2020 at 16:8 Comment(18)
Computing one from the other via conversion through uintptr_t is well-defined provided uintptr_t is defined. The upcoming "provenance" changes make a mess of this and make it difficult to do in a well-defined way, I think, but in all past versions of C it's been formally well-defined.Apothecium
@R..GitHubSTOPHELPINGICE: Computing one from the other is partially defined, not well defined. The C standard is not strict on how the conversion to uintptr_t is performed, and it certainly does not define relationships between the addresses of parameters or where arguments are passed.Betoken
@EricPostpischil: If you already computed that the difference between (uintptr_t)&a and (uintptr_t)&b) is d, then (void*)((uintptr_t)&a+d)==&b. That's what I mean by well-definedness here.Apothecium
@R..GitHubSTOPHELPINGICE: No such property is defined in the C standard. C 2018 7.20.1.4, which specifies the uintptr_t type, says that “any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.” The passage on converting pointers to integer types generally, 6.3.2.3 6, says “A pointer to an object type may be converted to a pointer to a different object type.”…Betoken
… A non-normative footnote says “The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.” Nothing specifies requirements on what the results will be if you convert, do arithmetic, and convert back. In a flat address space with plain pointers, then conforming to the intent, as expressed in the footnote, would give that result. But as we see from the normative text, this is not fully specified by the standard. It is not well defined.Betoken
@EricPostpischil: I don't follow what you're claiming. As you cited, 7.20.1.4 specifies that you can round-trip pointers through uintptr_t. As such, (void *)((uintptr_t)&a + ((uintptr_t)&b-(uintptr_t)&a)) == (void *)(uintptr_t)&b == (void *)&b. The first equality is the definedness of the cast to (void *) as a function of the operand value; the second is the round-trip property.Apothecium
@R..GitHubSTOPHELPINGICE: The fact that you can round-trip means that g(f(x)) = x, where x is a pointer, f is convert-pointer-to-uintptr_t, and g is convert-uintptr_t-to-pointer. Mathematically and logically, it does not imply that g(f(x)+4) = x+4. For example, if f(x) were x² and g(y) were sqrt(y), then g(f(x)) = x (for real non-negative x), but g(f(x)+4) ≠ x+4, in general. In the case of pointers, the conversion to uintptr_t might give an address in the high 24 bits and some authentication bits in the low 8 bits. Then adding 4 just screws up the authentication; it does not update…Betoken
… the address bits. Or the conversion to uintptr_t might give a base address in the high 16 bits and an offset in the low 16 bits, and adding 4 to the low bits might carry into the high bits, but the scaling is wrong (because the address represented is not base•65536+offset but rather is base•64+offset, as it was in some systems). Quite simply, the uintptr_t you get from a conversion is not necessarily a simple address.Betoken
@R..GitHubSTOPHELPINGICE from my reading of the standard, there is only a weak guarantee that (void *)(uintptr_t)(void *)p will compare equal to (void *)p. And it is worthwhile to note that the committee has commented on nearly this exact issue, concluding that "implementations ... may also treat pointers based on different origins as distinct even though they are bitwise identical."Physician
@R..GitHubSTOPHELPINGICE: Sorry, I missed that you were adding a value calculated as the different of two uintptr_t conversions of addresses rather than a different of pointers or a “known” distance in bytes. Sure, that is true, but how is it useful? It remains true that “there's still no defined way to compute one of those pointers from the other” as the answer states, but that calculation does not calculate b from a but rather calculates b from both a and b, since b must be used in the subtraction to calculate the amount to add. Computing one from the other is not defined.Betoken
@RyanAvella: That's what I referred to as the "provenance mess". But it's really not about treating pointers that are bitwise identical differently. It's about treating integers which are equal as values (not to mention equal bitwise) differently, which has always been explicitly forbidden by the C language (e.g. C requires negative zero integer, if it exists, to behave identically to normal zero in all expressions, and requires padding bits not to affect resulting value of expressions). So provenance absolutely is a change to the language semantics.Apothecium
@EricPostpischil: Note that there's no way to even talk about "12 bytes apart" in the abstract machine with pointers that aren't into the same array, except under a conversion to an integer type or other numeric representation (e.g. %p) under an implementation-defined definition that preserves byte distances within arrays and extends that difference operation to the whole domain of the integer type.Apothecium
@RyanAvella: The authors of the Standard have never reached a consensus as to what kinds of non-portable constructs are within the Standard's jurisdiction. If the Standard would explicitly state that support for certain constructs that are useful but non-portable is a quality-of-implementation issue outside the Standard's jurisdiction, that would resolve a lot of issues. I doubt that that will ever happen, though, because it would paint clang and gcc in a rather bad light.Filature
@RyanAvella: In clang, the act of comparing for equality two integer values that are derived from pointers can knock execution off the rails even if the integers are never converted to pointers. Why people view clang as a quality compiler when it makes fundamentally unsound optimizations is beyond me.Filature
@RyanAvella: See godbolt.org/z/WSV8uH for an example of clang getting knocked off the rails by an integer comparison. Sure clang is being "clever" by inferring that since (uintptr_t)(y+i) happens to be coincidentally equal (uintptr_t)(x+5), it can't possibly equal y, but that's unsound. If x happens to have five elements, y happens to immediately follow it, and i happens to equal zero, then (x+5), (y+i), and y could all legitimately represent the same address, even though the former pointer couldn't access the same storage as the other two.Filature
@Filature The implementation is allowed to embed metadata in the least significant bits of a pointer. (or even add padding bits with this metadata, if it wants) Then even though the two pointers may point to the same byte of memory, they can still compare as inequal. Any time you rely on the underlying implementation of a pointer (e.g. assuming it is a flat memory model) you are basically asking for undefined behavior.Physician
@RyanAvella: An integer comparison should always do one of two things: yield 0 with no side effects, or yield 1 with no side effects. In the function as written, if i is zero, having the comparison yield 0 should result in y[0] and the return value both being equal to 1; having the comparison yield 1 should result in y[0] and the return value both being equal to 2. In the code generated by clang, however, if the comparison happens to yield 1, then y[0] would equal 2, but the function would return 1. While there would be no requirement that the comparison ever yield 1...Filature
...an implementation that is going to behave erroneously if the comparison yields 1 would be required to ensure that if always yields 0 in all UB-free scenarios. The fact that a comparison might yield 0 or 1, chosen in Unspecified fashion, would allow the compiler to either execute or skip the branch at its leisure, but each individual execution of the function must behave in a consistent with one or the other choice.Filature

© 2022 - 2024 — McMap. All rights reserved.