Memory addressing with GNU Assember Intel Syntax
Asked Answered
T

2

8

I read this page containing a good list of differences between Intel and AT&T syntax for GAS but it did not cover the case of specifying an address with a displacement only.

Here I've assembled four lines with AT&T syntax:

                         .text
0000 48C7C008000000      mov    $8, %rax
0007 488B042508000000    mov    (8), %rax
000f 4889F0              mov    %rsi, %rax
0012 488B06              mov    (%rsi), %rax

The first two lines are different, as expected. The first moves the immediate value 8 into rax and the second moves the contents of address 8 into rax. But with Intel syntax I get the following odd behavior:

                         .text
                         .intel_syntax
0000 48C7C008000000      mov    %rax, 8
0007 48C7C008000000      mov    %rax, [8]
000e 4889F0              mov    %rax, %rsi
0011 488B06              mov    %rax, [%rsi]

Here the first and second lines assembled to the same machine code! First I thought the square brackets were wrong, so I added the third and fourth lines to the test, and the square brackets do work for memory addressing at least when registers are involved.

All of the documentation I have read show memory addressing examples with at least a base or index register, and sometimes a scale and displacement, but never a displacement only.

I do have experience with the Intel syntax using the NASM assembler, which does distinguish mov rax, 8 and mov rax, [8].

Is this a bug in GAS? Or if not, how do I specify the equivalent of NASM's mov rax, [8]?

I realize it is probably uncommon to specify a displacement-only address but I would like to get a complete understanding of all of the memory addressing forms with this syntax.

Tegument answered 18/4, 2012 at 17:48 Comment(2)
When I assemble your example with Intel syntax, I do get the expected behavior. For me, the generated machine code is exactly the same as what you show except the second line, which is 48 8b 04 25 08 00 00 for me.Kassel
If in the above Intel Syntax example to GAS you got the shown output then yes, your version GAS created incorrect code for mov %rax, 8 (which initializes rax with the constant 8, same as AT&T mov $8, %rax). All the addressing operations, as present, use the correct opcodes.Voyeur
S
7

There was indeed such a bug in gas -- see http://sourceware.org/bugzilla/show_bug.cgi?id=10637 .

It appears to be fixed in (or perhaps before) binutils 2.21.51.

Surbased answered 19/4, 2012 at 23:11 Comment(2)
Wow, I did not find that on a search. Indeed gcc -v gives "GNU assembler version 2.20.1 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.20.1-system.20100303" so I will update to 2.21.x and give it a shot.Tegument
Verified on the new binutils. Thanks @Matthew.Tegument
V
2

You're seeing a very special corner case of AT&T syntax here. Ordinarily, for address operands, you have:

<op> [ src, ] displacement(base,index,scale) [, tgt ]

Any of the constituents of an address operand in AT&T syntax are optional, so you can write mov (%rax, %rbx), ... or mov 0(%rax, %rbx, 1), ... or any other such combination.

Inside the () brackets the only number you ordinarily can have is the scale factor (if present).

But the assembler also accepts (and creates identical code for):

mov <absolute>, ...
mov (<absolute>), ...

This works only if the operand inside the () is a simple number / absolute address, otherwise the assembler complains that what you gave isn't a valid scale factor. This equivalence is a special case in AT&T syntax - I'm not sure why it was/is allowed.

The use of $ in AT&T syntax, though, always specifies a constant, not an address operand, which is the same as a naked number in Intel Syntax.

The following illustrates the equivalences:

$ cat t.s
        mov     (8), %rax
        mov     $8, %rax
        mov     8, %rax
.intel_syntax
        mov %rax, [ 8 ]
        mov %rax, 8
        mov %rax, %ds:8

$ objdump -w -d t.o

t.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
   8:   48 c7 c0 08 00 00 00            mov    $0x8,%rax
   f:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  17:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  1f:   48 c7 c0 08 00 00 00            mov    $0x8,%rax
  26:   48 8b 04 25 08 00 00 00         mov    0x8,%rax

$ objdump -w -M intel -d t.o

t.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :
   0:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8
   8:   48 c7 c0 08 00 00 00            mov    rax,0x8
   f:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8
  17:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  1f:   48 c7 c0 08 00 00 00            mov    rax,0x8
  26:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8
Voyeur answered 19/4, 2012 at 10:33 Comment(1)
The question is about the odd behaviour of the Intel syntax; but... I don't think the AT&T syntax here is a special case -- I suspect that it's just evaluating (8) as an expression, because it doesn't match anything that is special syntax. e.g. mov 8, %rax, mov (8), %rax, mov 4+4, %rax and mov (4+4), %rax all do the same thing.Surbased

© 2022 - 2024 — McMap. All rights reserved.