How do i get rid of call __x86.get_pc_thunk.ax

Asked 30/4, 2018 at 17:41 Answered 30/4, 2018 at 18:54

I tried to compile and convert a very simple C program to assembly language.

I am using Ubuntu and the OS type is 64 bit.

This is the C Program.

void add();

int main() { 
add();
return 0;
}

if i use gcc -S -m32 -fno-asynchronous-unwind-tables -o simple.S simple.c this is how my assembly source code File should look like:

.file   "main1.c"
.text
.globl main
.type   main, @function
main:
pushl   %ebp
movl    %esp, %ebp
andl    $-16, %esp
call    add
movl    $0, %eax
movl    %ebp, %esp
popl    %ebp
ret
.size   main, .-main
.ident  "GCC: (Debian 4.4.5-8) 4.4.5" // this part should say Ubuntu instead of Debian
.section    .note.GNU-stack,"",@progbits

but instead it looks like this:

.file   "main0.c"
.text
.globl  main
.type   main, @function
main:
leal    4(%esp), %ecx
andl    $-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl    %esp, %ebp
pushl   %ebx
pushl   %ecx
call    __x86.get_pc_thunk.ax
addl    $_GLOBAL_OFFSET_TABLE_, %eax
movl    %eax, %ebx
call    add@PLT
movl    $0, %eax
popl    %ecx
popl    %ebx
popl    %ebp
leal    -4(%ecx), %esp
ret
.size   main, .-main
.section        

.text.__x86.get_pc_thunk.ax,"axG",@progbits,__x86.get_pc_thunk.ax,comdat
.globl  __x86.get_pc_thunk.ax
.hidden __x86.get_pc_thunk.ax
.type   __x86.get_pc_thunk.ax, @function
__x86.get_pc_thunk.ax:
movl    (%esp), %eax
ret
.ident  "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section    .note.GNU-stack,"",@progbits

At my University they told me to use the Flag -m32 if I am using a 64 bit Linux version. Can somebody tell me what I am doing wrong? Am I even using the correct Flag?

edit after -fno-pie

.file   "main0.c"
.text
.globl  main
.type   main, @function
main:
leal    4(%esp), %ecx
andl    $-16, %esp
pushl   -4(%ecx)
pushl   %ebp
movl    %esp, %ebp
pushl   %ecx
subl    $4, %esp
call    add
movl    $0, %eax
addl    $4, %esp
popl    %ecx
popl    %ebp
leal    -4(%ecx), %esp
ret
.size   main, .-main
.ident  "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
.section    .note.GNU-stack,"",@progbits

it looks better but it's not exactly the same. for example what does leal mean?

Betrothal answered 30/4, 2018 at 17:41 Comment(5)

Try adding -fno-pie to the command line. If that works, I will explain. – Rickettsia 30/4, 2018 at 17:42

@Rickettsia thx I tried it but it's not exactly the same result. I posted the result above – Betrothal 30/4, 2018 at 17:50

The lea is part of gcc's very clunky code to align esp by 16. It does this only in main in 32-bit code, or if it need more than 16-byte alignment for locals in other functions. gcc copies the return address to just above the frame pointer, and does something over-complicated with ecx for no reason, saving/restoring a pointer to 4 bytes above the %esp on entry. LEA is not complicated – Improvisator 30/4, 2018 at 18:15

The compiler may generate any code it wants, as long as it is correct one, your code is correct one, there was no guarantee that you will get the exactly-same one. For that you would also need to use the same version of compiler and compile options (and that - binary stability of compiler result - is actually feature of gcc and many other compilers, a compiler may just as well produce different output every time, and it would be still usable as compiler (but it would be nightmare to debug it for the compiler creator itself), so don't take that for granted, it's actually result of hard work). – Folder 30/4, 2018 at 18:24

I've opened a new question specifically about the alignment prologue: https://mcmap.net/q/1161484/-motivation-for-useless-prologue-in-gcc-compiled-main-disabling-it/379897 – Mohan 30/4, 2018 at 18:51

As a general rule, you cannot expect two different compilers to generate the same assembly code for the same input, even if they have the same version number; they could have any number of extra "patches" to their code generation. As long as the observable behavior is the same, anything goes.

You should also know that GCC, in its default -O0 mode, generates intentionally bad code. It's tuned for ease of debugging and speed of compilation, not for either clarity or efficiency of the generated code. It is often easier to understand the code generated by gcc -O1 than the code generated by gcc -O0.

You should also know that the main function often needs to do extra setup and teardown that other functions do not need to do. The instruction leal 4(%esp),%ecx is part of that extra setup. If you only want to understand the machine code corresponding to the code you wrote, and not the nitty details of the ABI, name your test function something other than main.

(As pointed out in the comments, that setup code is not as tightly tuned as it could be, but it doesn't normally matter, because it's only executed once in the lifetime of the program.)

Now, to answer the question that was literally asked, the reason for the appearance of

call __x86.get_pc_thunk.ax

is because your compiler defaults to generating "position-independent" executables. Position-independent means the operating system can load the program's machine code at any address in (virtual) memory and it'll still work. This allows things like address space layout randomization, but to make it work, you have to take special steps to set up a "global pointer" at the beginning of every function that accesses global variables or calls another function (with some exceptions). It's actually easier to explain the code that's generated if you turn optimization on:

main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        pushl   %ecx

This is all just setting up main's stack frame and saving registers that need to be saved. You can ignore it.

        call    __x86.get_pc_thunk.bx
        addl    $_GLOBAL_OFFSET_TABLE_, %ebx

The special function __x86.get_pc_thunk.bx loads its return address -- which is the address of the addl instruction that immediately follows -- into the EBX register. Then we add to that address the value of the magic constant _GLOBAL_OFFSET_TABLE_, which, in position-independent code, is the difference between the address of the instruction that uses _GLOBAL_OFFSET_TABLE_ and the address of the global offset table. Thus, EBX now points to the global offset table.

        call    add@PLT

Now we call add@PLT, which means call add, but jump through the "procedure linkage table" to do it. The PLT takes care of the possibility that add is defined in a shared library rather than the main executable. The code in the PLT uses the global offset table and assumes that you have already set EBX to point to it, before calling an @PLT symbol. That's why main has to set up EBX even though nothing appears to use it. If you had instead written something like

 extern int number;
 int main(void) { return number; }

then you would see a direct use of the GOT, something like

    call    __x86.get_pc_thunk.bx
    addl    $_GLOBAL_OFFSET_TABLE_, %ebx
    movl    number@GOT(%ebx), %eax
    movl    (%eax), %eax

We load up EBX with the address of the GOT, then we can load the address of the global variable number from the GOT, and then we actually dereference the address to get the value of number.

If you compile 64-bit code instead, you'll see something different and much simpler:

    movl    number(%rip), %eax

Instead of all this mucking around with the GOT, we can just load number from a fixed offset from the program counter. PC-relative addressing was added along with the 64-bit extensions to the x86 architecture. Similarly, your original program, in 64-bit position-independent mode, will just say

    call    add@PLT

without setting up EBX first. The call still has to go through the PLT, but the PLT uses PC-relative addressing itself and doesn't need any help from its caller.

The only difference between __x86.get_pc_thunk.bx and __x86.get_pc_thunk.ax is which register they store their return address in: EBX for .bx, EAX for .ax. I have also seen GCC generate .cx and .dx variants. It's just a matter of which register it wants to use for the global pointer -- it must be EBX if there are going to be calls through the PLT, but if there aren't any then it can use any register, so it tries to pick one that isn't needed for anything else.

Why does it call a function to get the return address? Older compilers would do this instead:

    call 1f
1:  pop  %ebx

but that screws up return-address prediction, so nowadays the compiler goes to a little extra trouble to make sure every call is paired with a ret.

Rickettsia answered 30/4, 2018 at 18:54 Comment(2)

I really enjoyed reading your answer, and it was very useful for what I'm currently learning. Thanks very much. – Tautology 15/4, 2019 at 21:0

Excellent answer. Bravo @zwol. – Bedrabble 4/3, 2020 at 12:30

The extra junk you're seeing is due to your version of GCC special-casing main to compensate for possibly-broken entry point code starting it with a misaligned stack. I'm not sure how to disable this or if it's even possible, but renaming the function to something other than main will suppress it for the sake of your reading.

After renaming to xmain I get:

xmain:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $8, %esp
        call    add
        movl    $0, %eax
        leave
        ret

Mohan answered 30/4, 2018 at 18:37 Comment(1)

In my case renaming main didn't work. But -fno-pic flag to GCC removed PIC codes. – Vespers 12/11, 2019 at 8:51

Recommended topics

Hot tags