Why does the assembly encoding of objdump vary?
Asked Answered
A

1

2

I was reading this article about Position Independent Code and I encountered this assembly listing of a function.

0000043c <ml_func>:
 43c:   55                      push   ebp
 43d:   89 e5                   mov    ebp,esp
 43f:   e8 16 00 00 00          call   45a <__i686.get_pc_thunk.cx>
 444:   81 c1 b0 1b 00 00       add    ecx,0x1bb0
 44a:   8b 81 f0 ff ff ff       mov    eax,DWORD PTR [ecx-0x10]
 450:   8b 00                   mov    eax,DWORD PTR [eax]
 452:   03 45 08                add    eax,DWORD PTR [ebp+0x8]
 455:   03 45 0c                add    eax,DWORD PTR [ebp+0xc]
 458:   5d                      pop    ebp
 459:   c3                      ret

0000045a <__i686.get_pc_thunk.cx>:
 45a:   8b 0c 24                mov    ecx,DWORD PTR [esp]
 45d:   c3                      ret

However, on my machine (gcc-7.3.0, Ubuntu 18.04 x86_64), I got slightly different result below:

0000044d <ml_func>:
 44d:   55                      push   %ebp
 44e:   89 e5                   mov    %esp,%ebp
 450:   e8 29 00 00 00          call   47e <__x86.get_pc_thunk.ax>
 455:   05 ab 1b 00 00          add    $0x1bab,%eax
 45a:   8b 90 f0 ff ff ff       mov    -0x10(%eax),%edx
 460:   8b 0a                   mov    (%edx),%ecx
 462:   8b 55 08                mov    0x8(%ebp),%edx
 465:   01 d1                   add    %edx,%ecx
 467:   8b 90 f0 ff ff ff       mov    -0x10(%eax),%edx
 46d:   89 0a                   mov    %ecx,(%edx)
 46f:   8b 80 f0 ff ff ff       mov    -0x10(%eax),%eax
 475:   8b 10                   mov    (%eax),%edx
 477:   8b 45 0c                mov    0xc(%ebp),%eax
 47a:   01 d0                   add    %edx,%eax
 47c:   5d                      pop    %ebp
 47d:   c3                      ret 

The main difference I found was that the semantic of mov instruction. In the upper listing, mov ebp,esp actually moves esp to ebp, while in the lower listing, mov %esp,%ebp does the same thing, but the order of operands are different.

This is quite confusing, even when I have to code hand-written assembly. To summarize, my questions are (1) why I got different assembly representations for the same instructions and (2) which one I should use, when writing assembly code (e.g. with __asm(:::);)

Anticholinergic answered 16/3, 2019 at 4:16 Comment(4)
The top is in Intel syntax and the bottom one is in AT&T syntax. The AT&T syntax is different and the source and destination are reversed so it is source, destination. If you want Intel syntax with OBJDUMP use the option -MintelRosalia
As for your second question if you compile with GCC and want Intel Syntax in inline assembly you can pas the -masm-intel option to GCC. The default is AT&T syntax.Rosalia
a quick way to see intel vs AT&T is to look for lines with immediate values like add ecx,0x1bb0 or add $0x1bab,%eax that will establish the syntax and you can then flip it or not in your mind as you read it to whichever you think is sane. Which ordering is sane is on the order of religion and politics, very personal.Eoin
other clues as to the specific age or syntax used within the code (assembly language is defined by the assembler, the tool, not by some standard) is to look for the mips style percent sign on the registers, the mips style -0x10(%eax) syntax or the intel style DWORD PTR [eax] with brackets and not intel style as in intel vs AT&T but intel style in general independent of AT&T or not. Your first example is classic intel style intel syntax assembly language, the latter is gnu assembler style, gnu assembler is well known for mangling the syntax for all targets not just x86.Eoin
J
5

obdjump defaults to -Matt AT&T syntax (like your 2nd code block). See vs. . The tag wikis have some info about the syntax differences: https://stackoverflow.com/tags/att/info vs. https://stackoverflow.com/tags/intel-syntax/info

Either syntax has the same limitations, imposed by what the machine itself can do, and what's encodeable in machine code. They're just different ways of expressing that in text.


Use objdump -d -Mintel for Intel syntax. I use alias disas='objdump -drwC -Mintel' in my .bashrc, so I can disas foo.o and get the format I want, with relocations printed (important for making sense of a non-linked .o), without line-wrapping for long instructions, and with C++ symbol names demangled.


In inline asm, you can use either syntax, as long as it matches what the compiler is expecting. The default is AT&T, and that's what I'd recommend using for compatibility with clang. Maybe there's a way, but clang doesn't work the same way as GCC with -masm=intel.

Also, AT&T is basically standard for GNU C inline asm on x86, and it means you don't need special build options for your code to work.

But you can use gcc -masm=intel to compile source files that use Intel syntax in their asm statements. This is fine for your own use if you don't care about clang.


If you're writing code for a header, you can make it portable between AT&T and Intel syntax using dialect alternatives, at least for GCC:

static inline
void atomic_inc(volatile int *p) {
    // use __asm__ instead of asm in headers, so it works even with -std=c11 instead of gnu11
    __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
// TODO: flag output for return value?
   // maybe doesn't need to be asm volatile; compilers know that modifying pointed-to memory is a visible side-effect unless it's a local that fully optimizes away.
   // If you want this to work as a memory barrier, use a `"memory"` clobber to stop compile-time memory reordering.  The lock prefix provides a runtime full barrier
}

source+asm outputs for gcc/clang on the Godbolt compiler explorer.

With g++ -O3 (default or -masm=att), we get

atomic_inc(int volatile*):
    lock addl $1, (%rdi)              # operand-size is from my explicit addl suffix
    ret

With g++ -O3 -masm=intel, we get

atomic_inc(int volatile*):
    lock  add DWORD PTR [rdi], 1      # operand-size came from the %0 expansion
    ret

clang works with the AT&T version, but fails with -masm=intel (or the -mllvm --x86-asm-syntax=intel which that implies), because that apparently only applies to code emitted by LLVM, not for how the front-end fills in the asm template.

The clang error message is:

<source>:4:13: error: unknown use of instruction mnemonic without a size suffix
    __asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
            ^
<inline asm>:1:2: note: instantiated into assembly here
        lock  add (%rdi), 1
        ^
1 error generated.

It picked the "Intel" syntax alternative, but still filled in the template with an AT&T memory operand.

Joaquinajoash answered 16/3, 2019 at 5:52 Comment(2)
Wow. Thanks for your great explanation with examples!Anticholinergic
Update, clang 14 supports -masm=intel in a way compatible with GCC, treating asm statements as Intel syntax: How to set gcc or clang to use Intel syntax permanently for inline asm() statements?Joaquinajoash

© 2022 - 2024 — McMap. All rights reserved.