Why is %eax zeroed before a call to printf?
Asked Answered
R

3

57

I am trying to pick up a little x86. I am compiling on a 64bit mac with gcc -S -O0.

Code in C:

printf("%d", 1);

Output:

movl    $1, %esi
leaq    LC0(%rip), %rdi
movl    $0, %eax        ; WHY?
call    _printf

I do not understand why %eax is cleared to 0 before 'printf' is called. Since printf returns the number of characters printed to %eax my best guess it is zeroed out to prepare it for printf but I would have assumed that printf would have to be responsible for getting it ready. Also, in contrast, if I call my own function int testproc(int p1), gcc sees no need to prepare %eax. So I wonder why gcc treats printf and testproc differently.

Responsum answered 2/6, 2011 at 9:26 Comment(0)
S
45

From the x86_64 System V ABI register usage table:

  • %rax       temporary register; with variable arguments passes information about the number of vector registers used; 1st return register ...

printf is a function with variable arguments, and the number of vector registers used is zero.

Note that printf must check only %al, because the caller is allowed to leave garbage in the higher bytes of %rax. (Still, xor %eax,%eax is the most efficient way to zero %al)

See the this Q&A and the tag wiki for more details, or for up-to-date ABI links if the above link is stale.

Sandman answered 2/6, 2011 at 9:36 Comment(5)
puts is also zeroing out %eax right before the call though it only takes a single pointer. Why is this?Responsum
This also happens before any call to my own void proc() function (even with -O2 set), but it is not zeroed when calling a void proc2(int param) function.Responsum
For the record, it happens before calls to void proc() because that signature in C actually says nothing about proc's arity, and it could as well be a variadic function, so zeroing rax is necessary. void proc() is different from void proc(void). See https://mcmap.net/q/14687/-is-it-better-to-use-c-void-arguments-quot-void-foo-void-quot-or-not-quot-void-foo-quot-duplicateHuffman
@Responsum FWW I just tried puts with x86-64 gcc 9.2 and eax was not zeroed prior to the call.Amador
I have tested it and it appears that providing two fp arguments were in xmm0 and xmm1 it actually does not matter whether rax is 1 or 2``. I wonder why.Mezzanine
E
62

In the x86_64 ABI, if a function has variable arguments then AL (which is part of EAX) is expected to hold the number of vector registers used to hold arguments to that function.

In your example:

printf("%d", 1);

has an integer argument so there’s no need for a vector register, hence AL is set to 0.

On the other hand, if you change your example to:

printf("%f", 1.0f);

then the floating-point literal is stored in a vector register and, correspondingly, AL is set to 1:

movsd   LC1(%rip), %xmm0
leaq    LC0(%rip), %rdi
movl    $1, %eax
call    _printf

As expected:

printf("%f %f", 1.0f, 2.0f);

will cause the compiler to set AL to 2 since there are two floating-point arguments:

movsd   LC0(%rip), %xmm0
movapd  %xmm0, %xmm1
movsd   LC2(%rip), %xmm0
leaq    LC1(%rip), %rdi
movl    $2, %eax
call    _printf

As for your other questions:

puts is also zeroing out %eax right before the call though it only takes a single pointer. Why is this?

It shouldn’t. For instance:

#include <stdio.h>

void test(void) {
    puts("foo");
}

when compiled with gcc -c -O0 -S, outputs:

pushq   %rbp
movq    %rsp, %rbp
leaq    LC0(%rip), %rdi
call    _puts
leave
ret

and %eax is not zeroed out. However, if you remove #include <stdio.h> then the resulting assembly does zero out %eax right before calling puts():

pushq   %rbp
movq    %rsp, %rbp
leaq    LC0(%rip), %rdi
movl    $0, %eax
call    _puts
leave
ret

The reason is related to your second question:

This also happens before any call to my own void proc() function (even with -O2 set), but it is not zeroed when calling a void proc2(int param) function.

If the compiler doesn't see the declaration of a function then it makes no assumptions about its parameters, and the function could well accept variable arguments. The same applies if you specify an empty parameter list (which you shouldn’t, and it’s marked as an obsolescent C feature by ISO/IEC). Since the compiler doesn’t have enough information about the function parameters, it zeroes out %eax before calling the function because it might be the case that the function is defined as having variable arguments.

For example:

#include <stdio.h>

void function() {
    puts("foo");
}

void test(void) {
    function();
}

where function() has an empty parameter list, results in:

pushq   %rbp
movq    %rsp, %rbp
movl    $0, %eax
call    _function
leave
ret

However, if you follow the recommend practice of specifying void when the function accepts no parameters, such as:

#include <stdio.h>

void function(void) {
    puts("foo");
}

void test(void) {
    function();
}

then the compiler knows that function() doesn't accept arguments — in particular, it doesn’t accept variable arguments — and hence doesn’t clear %eax before calling that function:

pushq   %rbp
movq    %rsp, %rbp
call    _function
leave
ret
Epitomize answered 2/6, 2011 at 9:45 Comment(3)
Note from ABI: "We use vector register to refer to either SSE or AVX register."Pearson
What is the advantage of passing vector count in %rax? Is it performance-only, to avoid saving useless registers on "The Register Save Area"?Pearson
@CiroSantilli新疆改造中心法轮功六四事件: Yes. The actual requirement is that AL >= number of FP/vector args. You can't use AL=0 to get the callee to look only on the stack for FP args; that's an ABI violation and the callee will just load memory it never spilled xmm0..7 to, if it's compiled the way GCC normally does.Tonitonia
S
45

From the x86_64 System V ABI register usage table:

  • %rax       temporary register; with variable arguments passes information about the number of vector registers used; 1st return register ...

printf is a function with variable arguments, and the number of vector registers used is zero.

Note that printf must check only %al, because the caller is allowed to leave garbage in the higher bytes of %rax. (Still, xor %eax,%eax is the most efficient way to zero %al)

See the this Q&A and the tag wiki for more details, or for up-to-date ABI links if the above link is stale.

Sandman answered 2/6, 2011 at 9:36 Comment(5)
puts is also zeroing out %eax right before the call though it only takes a single pointer. Why is this?Responsum
This also happens before any call to my own void proc() function (even with -O2 set), but it is not zeroed when calling a void proc2(int param) function.Responsum
For the record, it happens before calls to void proc() because that signature in C actually says nothing about proc's arity, and it could as well be a variadic function, so zeroing rax is necessary. void proc() is different from void proc(void). See https://mcmap.net/q/14687/-is-it-better-to-use-c-void-arguments-quot-void-foo-void-quot-or-not-quot-void-foo-quot-duplicateHuffman
@Responsum FWW I just tried puts with x86-64 gcc 9.2 and eax was not zeroed prior to the call.Amador
I have tested it and it appears that providing two fp arguments were in xmm0 and xmm1 it actually does not matter whether rax is 1 or 2``. I wonder why.Mezzanine
P
9

The reason is the efficient implementation of variadic functions. When a variadic function calls va_start, it is often not clear to the compiler if va_arg will ever be invoked for a floating point argument. Therefore, the compiler always has to save all vector registers which can hold parameters, so that a potential future va_arg call can access it even if the register has been clobbered in the meantime. This is fairly costly because there are eight such registers on x86-64.

Therefore, the caller passes the number of vector registers as an optimization hint to the variadic function. If there are no vector registers involved in the call, none of them need to be saved. For example, the start of the sprintf function in glibc looks like this:

00000000000586e0 <_IO_sprintf@@GLIBC_2.2.5>:
   586e0:       sub    $0xd8,%rsp
   586e7:       mov    %rdx,0x30(%rsp)
   586ec:       mov    %rcx,0x38(%rsp)
   586f1:       mov    %r8,0x40(%rsp)
   586f6:       mov    %r9,0x48(%rsp)
   586fb:       test   %al,%al
   586fd:       je     58736 <_IO_sprintf@@GLIBC_2.2.5+0x56>
   586ff:       movaps %xmm0,0x50(%rsp)
   58704:       movaps %xmm1,0x60(%rsp)
   58709:       movaps %xmm2,0x70(%rsp)
   5870e:       movaps %xmm3,0x80(%rsp)
   58716:       movaps %xmm4,0x90(%rsp)
   5871e:       movaps %xmm5,0xa0(%rsp)
   58726:       movaps %xmm6,0xb0(%rsp)
   5872e:       movaps %xmm7,0xc0(%rsp)
   58736:       mov    %fs:0x28,%rax

In practice, all implementations use %al only as flag, jumping over the vector save instructions if it is zero. A computed goto to avoid saving unnecessary registers does not seem to improve performance.

Furthermore, if compilers can detect that va_arg is never called for a floating point argument, they will optimize away the vector register save operation completely, so setting %al is superfluous in that case. But the caller cannot know that implementation detail, so it will still have to set %al.

Poulter answered 3/11, 2019 at 21:4 Comment(2)
printf gets stuck in an infinite loop with AL = 10 on x86-64 Linux with older gcc shows old GCC code-gen for variadic functions, where it actually did a computed jump to only run the exact number of movaps instructions required. With modern CPUs that handle movaps stores as a single uop (with micro-fusion and having a 16-byte load/store data path, so Core 2 / K10 and later), letting the store buffer absorb more stores was cheaper than computing a branch, and sped up the very common case where AL=0.Tonitonia
Fun fact: scanf and so on can never take FP args, but it works internally by passing a va_list to another function so that fact isn't visible to the compiler. (So vfscanf and scanf can share an implementation). That's another reason why this save code sometimes doesn't get optimized away.Tonitonia

© 2022 - 2024 — McMap. All rights reserved.