Referencing the contents of a memory location. (x86 addressing modes)
Asked Answered
S

2

12

I have a memory location that contains a character that I want to compare with another character (and it's not at the top of the stack so I can't just pop it). How do I reference the contents of a memory location so I can compare it?

Basically how do I do it syntactically.

Shy answered 3/12, 2015 at 4:50 Comment(0)
S
39

For a more extended discussion of addressing modes (16/32/64bit), see Agner Fog's "Optimizing Assembly" guide, section 3.3. That guide has much more detail than this answer for relocation for symbols and or 32bit position-independent code, among other things.

And of course Intel and AMD's manuals have whole sections on the details of the encodings of ModRM (and optional SIB and disp8/disp32 bytes), which makes it clear what's encodeable and why limits exist.

See also: table of AT&T(GNU) syntax vs. NASM syntax for different addressing modes, including indirect jumps / calls. Also see the collection of links at the bottom of this answer.


x86 (32 and 64bit) has several addressing modes to choose from. They're all of the form:

[base_reg + index_reg*scale + displacement]      ; or a subset of this
[RIP + displacement]     ; or RIP-relative: 64bit only.  No index reg is allowed

(where scale is 1, 2, 4, or 8, and displacement is a signed 32-bit constant). All the other forms (except RIP-relative) are subsets of this that leave out one or more component. This means you don't need a zeroed index_reg to access [rsi] for example.

In asm source code, it doesn't matter what order you write things: [5 + rax + rsp + 15*4 + MY_ASSEMBLER_MACRO*2] works fine. (All the math on constants happens at assemble time, resulting in a single constant displacement.)

The registers all have to be the same size as each other. And the same size as the mode you're in unless you use an alternate address-size, requiring an extra prefix byte. Narrow pointers are rarely useful outside of the x32 ABI (ILP32 in long mode) where you might want to ignore the top 32 bits of a register, e.g. instead of using movsxd to sign-extend a 32-bit possibly-negative offset in a register to 64-bit pointer width.

If you want to use al as an array index, for example, you need to zero- or sign-extend it to pointer width. (Having the upper bits of rax already zeroed before messing around with byte registers is sometimes possible, and is a good way to accomplish this.)


The limitations reflect what's encodeable in machine-code, as usual for assembly language. The scale factor is a 2-bit shift count. The ModRM (and optional SIB) bytes can encode up to 2 registers but not more, and don't have any modes that subtract registers, only add. Any register can be a base. Any register except ESP/RSP can be an index. See rbp not allowed as SIB base? for the encoding details, like why [rsp] always needs a SIB byte.

Every possible subset of the general case is encodable, except ones using e/rsp*scale (obviously useless in "normal" code that always keeps a pointer to stack memory in esp).

Normally, the code-size of the encodings is:

  • 1B for one-register modes (mod/rm (Mode / Register-or-memory))
  • 2B for two-register modes (mod/rm + SIB (Scale Index Base) byte)
  • displacement can be 0, 1, or 4 bytes (sign-extended to 32 or 64, depending on address-size). So displacements from [-128 to +127] can use the more compact disp8 encoding, saving 3 bytes vs. disp32.

ModRM is always present, and its bits signal whether a SIB is also present. Similar for disp8/disp32. Code-size exceptions:

  • [reg*scale] by itself can only be encoded with a 32-bit displacement (which can of course be zero). Smart assemblers work around that by encoding lea eax, [rdx*2] as lea eax, [rdx + rdx] but that trick only works for scaling by 2. Either way a SIB byte is required, in addition to ModRM.

  • It's impossible to encode e/rbp or r13 as the base register without a displacement byte, so [ebp] is encoded as [ebp + byte 0]. The no-displacement encodings with ebp as a base register instead mean there's no base register (e.g. for [disp + reg*scale]).

  • [e/rsp] requires a SIB byte even if there's no index register. (whether or not there's a displacement). The mod/rm encoding that would specify [rsp] instead means that there's a SIB byte.

See Table 2-5 in Intel's ref manual, and the surrounding section, for the details on the special cases. (They're the same in 32 and 64bit mode. Adding RIP-relative encoding didn't conflict with any other encoding, even without a REX prefix.)

For performance, it's typically not worth it to spend an extra instruction just to get smaller x86 machine code. On Intel CPUs with a uop cache, it's smaller than L1 I$, and a more precious resource. Minimizing fused-domain uops is typically more important.


How they're used

(This question was tagged MASM, but some of this answer talks about NASM's version of Intel syntax, especially where they differ for x86-64 RIP-relative addressing. AT&T syntax is not covered, but keep in mind that's just another syntax for the same machine code so the limitations are the same.)

This table doesn't exactly match the hardware encodings of possible addressing modes, since I'm distinguishing between using a label (for e.g. global or static data) vs. using a small constant displacement. So I'm covering hardware addressing modes + linker support for symbols.

(Note: usually you'd want movzx eax, byte [esi] or movsx when the source is a byte, but mov al, byte_src does assemble and is common in old code, merging into the low byte of EAX/RAX. See Why doesn't GCC use partial registers? and How to isolate byte and word array elements in a 64-bit register)

If you have an int*, often you'd use the scale factor to scale an index by the array element size if you have an element index instead of a byte offset. (Prefer byte offsets or pointers to avoid indexed addressing modes for code-size reasons, and performance in some cases especially on Intel CPUs where it can hurt micro-fusion). But you can also do other things.
If you have a pointer char array* in esi:

  • mov al, esi: invalid, won't assemble. Without square brackets, it's not a load at all. It's an error because the registers aren't the same size.

  • mov al, [esi] loads the byte pointed to, i.e. array[0] or *array.

  • mov al, [esi + ecx] loads array[ecx].

  • mov al, [esi + 10] loads array[10].

  • mov al, [esi + ecx*8 + 200] loads array[ecx*8 + 200]

  • mov al, [global_array + 10] loads from global_array[10]. In 64-bit mode, this can and should be a RIP-relative address. Using NASM DEFAULT REL is recommended, to generate RIP-relative addresses by default instead of having to always use [rel global_array + 10]. MASM does this by default I think. There is no way to use an index register with a RIP-relative address directly. The normal method is lea rax, [global_array] mov al, [rax + rcx*8 + 10] or similar.

    See How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more details, and syntax for GAS .intel_syntax, NASM, and GAS AT&T syntax.

  • mov al, [global_array + ecx + edx*2 + 10] loads from global_array[ecx + edx*2 + 10] Obviously you can index a static/global array with a single register. Even a 2D array using two separate registers is possible. (pre-scaling one with an extra instruction, for scale factors other than 2, 4, or 8). Note that the global_array + 10 math is done at link time. The object file (assembler output, linker input) informs the linker of the +10 to add to the final absolute address, to put the right 4-byte displacement into the executable (linker output). This is why you can't use arbitrary expressions on link-time constants that aren't assemble-time constants (e.g. symbol addresses).

    In 64-bit mode, this still needs the global_array as a 32-bit absolute address for the disp32 part, which only works in a position-dependent Linux executable, or largeaddressaware=no Windows.

  • mov al, 0ABh Not a load at all, but instead an immediate-constant that was stored inside the instruction. (Note that you need to prefix a 0 so the assembler knows it's a constant, not a symbol. Some assemblers will also accept 0xAB, and some of those won't accept 0ABh: see more).

    You can use a symbol as the immediate constant, to get an address into a register:

    • NASM: mov esi, global_array assembles into a mov esi, imm32 that puts the address into esi.
    • MASM: mov esi, OFFSET global_array is needed to do the same thing.
    • MASM: mov esi, global_array assembles into a load: mov esi, dword [global_array].

    In 64-bit mode, the standard way to put a symbol address into a register is a RIP-relative LEA. Syntax varies by assembler. MASM does it by default. NASM needs a default rel directive, or [rel global_array]. GAS needs it explicitly in every addressing mode. How to load address of function or label into register in GNU Assembler. mov r64, imm64 is usually supported too, for 64-bit absolute addressing, but is normally the slowest option (code size creates front-end bottlenecks). mov rdi, format_string / call printf typically works in NASM, but is not efficient.

    As an optimization when addresses can be represented as a 32-bit absolute (instead of as a rel32 offset from the current position), mov reg, imm32 is still optimal just like in 32-bit code. (Linux non-PIE executable or Windows with LargeAddressAware=no). But note that in 32-bit mode, lea eax, [array] is not efficient: it wastes a byte of code-size (ModRM + absolute disp32) and can't run on as many execution ports as mov eax, imm32. 32-bit mode doesn't have RIP-relative addressing.

    Note that OS X loads all code at an address outside the low 32 bits, so 32-bit absolute addressing is unusable. Position-independent code isn't required for executables, but you might as well because 64-bit absolute addressing is less efficient than RIP-relative. The macho64 object file format doesn't support relocations for 32-bit absolute addresses the way Linux ELF does. Make sure not to use a label name as a compile-time 32-bit constant anywhere. An effective-address like [global_array + constant] is fine because that can be assembled to a RIP-relative addressing mode. But [global_array + rcx] is not allowed because RIP can't be used with any other registers, so it would have to be assembled with the absolute address of global_array hard-coded as the 32bit displacement (which will be sign-extended to 64b).


Any and all of these addressing modes can be used with LEA to do integer math with a bonus of not affecting flags, regardless of whether it's a valid address. Using LEA on values that aren't addresses / pointers?

[esi*4 + 10] is usually only useful with LEA (unless the displacement is a symbol, instead of a small constant). In machine code, there is no encoding for scaled-register alone, so [esi*4] has to assemble to [esi*4 + 0], with 4 bytes of zeros for a 32-bit displacement. It's still often worth it to copy+shift in one instruction instead of a shorter mov + shl, because usually uop throughput is more of a bottleneck than code size, especially on CPUs with a decoded-uop cache.


You can specify segment-overrides like mov al, fs:[esi] (NASM syntax). A segment-override just adds a prefix-byte in front of the usual encoding. Everything else stays the same, with the same syntax.

You can even use segment overrides with RIP-relative addressing. 32-bit absolute addressing takes one more byte to encode than RIP-relative, so mov eax, fs:[0] can most efficiently be encoded using a relative displacement that produces a known absolute address. i.e. choose rel32 so RIP+rel32 = 0. YASM will do this with mov ecx, [fs: rel 0], but NASM always uses disp32 absolute addressing, ignoring the rel specifier. I haven't tested MASM or gas.


If the operand-size is ambiguous (e.g. in an instruction with an immediate and a memory operand), use byte / word / dword / qword to specify:

mov       dword [rsi + 10], 123   ; NASM
mov   dword ptr [rsi + 10], 123   ; MASM and GNU .intex_syntax noprefix

movl      $123, 10(%rsi)         # GNU(AT&T): operand size from mnemonic suffix

See the yasm docs for NASM-syntax effective addresses, and/or the wikipedia x86 entry's section on addressing modes.

The wiki page says what's allowed in 16bit mode. Here's another "cheat sheet" for 32bit addressing modes.


16-bit addressing modes

16bit address size can't use a SIB byte, so all the one and two register addressing modes are encoded into the single mod/rm byte. reg1 can be BX or BP, and reg2 can be SI or DI (or you can use any of those 4 registers by themself). Scaling is not available. 16bit code is obsolete for a lot of reasons, including this one, and not worth learning if you don't have to.

Note that the 16bit restrictions apply in 32bit code when the address-size prefix is used, so 16bit LEA-math is highly restrictive. However, you can work around that: lea eax, [edx + ecx*2] sets ax = dx + cx*2, because garbage in the upper bits of the source registers has no effect.

There's also a more detailed guide to addressing modes, for 16bit. 16-bit has a limited set of addressing modes (only a few registers are valid, and no scale factors), but you might want to read it to understand some fundamentals about how x86 CPUs use addresses because some of that hasn't changed for 32-bit mode.


Related topics:

Many of these are also linked above, but not all.

Synovitis answered 3/12, 2015 at 5:18 Comment(31)
16-bit code is still around though. And the user did tag this as DOS, so an explanation of the 16-bit restrictions would probably be reasonable for anyone stumbling on this question and answer. The best rule of thumb I have seen that is reasonably easy to understand and remember, can be found in Section 1.2.7 An Easy Way to Remember the 8086 Memory Addressing Modes of this document . I find it a better description than the Wiki article you linked toVenue
@MichaelPetch: yup, that's why I linked the wikipedia article, since as I said, it shows which regs can be used as which component of an address.Synovitis
No, that isn't the Wiki article. The wiki article doesn't offer up much of an explanation of how you mix and match those rows and columns. I helped someone last year on this site, they didn't understand the Wiki version, but clued in with the other one.Venue
@MichaelPetch: I meant I saw the DOS tag and made more of an effort than I would otherwise. I didn't say wikip went into as much detail as your link, which is quite nice. :) If it also included that kind of tutorial/guide for 32bit addressing modes, I'd add it to the x86 tag wiki. It might still be worth doing so, with a mention that 32 and 64bit don't have the same restrictions. It does seem there's a lack of nice tutorials/guides that lay out exactly how to write the operand part of the instructions that insn ref manuals document.Synovitis
@MichaelPetch: I tidied up and expanded this, and linked to it from the x86 tag wiki. Would you mind proof-reading it, since it's now sort of part of SO's official x86 documentation. (or it will be, when that tag-wiki edit is approved).Synovitis
It may be worth mentioning that [OSX does not allow global_array + [10]](stackoverflow.com/questions/26927278/…).Monet
I'm curious to what the point of scale is. Normally when I am iterating through an array I increase reg2. So I can either do add 4, reg2 and use scale=4 or do add 16, reg2 and use scale=1. So far I have always used scale=1. The only reason I can see to use scale>1 is if I want both the value of the iterator and scale*iterator.Monet
@Zboson: If you have two array of different-size elements, you can use a single index to traverse both of them. And yes, sometimes you do want the value of the iterator. Or someone passed you an array index instead of a pointer. Or there's a data structure with indices. i=a[i] -> mov eax, [esi + eax*4]. In old-school 8086, I guess loop was pretty common, but you can't use cx as an index register, so IDK. Maybe people wrote loops with dec for that case. Maybe they couldn't think of anything else to do with two spare bits? Encoding more registers would have made sense, though!Synovitis
@Zboson: Actually, [global_array + 10] is allowed, because it can be encoded as RIP-relative (since there are no register offsets). Thanks for the suggestion mention the OS X caveat, though. Good point.Synovitis
Using RIP-relative is a different addressing mode. Of course you can still use global arrays (I prefer to use the term statically allocated array) with OSX. It would be pretty stupid if you could not. But you can't do [absolute-32-bit-address + register] with OSX. That's what the reader of your answer should assume that [global_array + 10] means.Monet
@Zboson: I get to abs vs. rel in a later bullet point about 64bit, and I didn't want to clutter the early part of the answer. Once you know about RIP-relative addressing and DEFAULT REL, you should be thinking of it every time you see a label inside an effective address.Synovitis
In that case I don't agree that it's usually RIP relative. Neither NASM, YASM, or GAS default to RIP relative. GCC won't use RIP relative by default for statically allocated arrays. It uses absolute addresses. See the section on addressing modes in Agner's assembly manualMonet
he writes "Note that NASM, YASM and Gnu assemblers can make 32-bit absolute addresses when you do not explicitly specify rip-relative addresses. You have to specify default rel in NASM/YASM or [mem+rip] in Gas to avoid 32-bit absolute addresses". RIP relative is not the default on Linux. It's unnecessary with Linux. It's necessarily with OSX. With windows it's only necessary with DLLs but MSVC still uses RIP relative by default.Monet
I don't mean to be rude but I think Agner does a much better job explaining addressing modes. Of course he uses a few pages to do this and gives several examples. I would suggest people see section 3.3 of Agner's manual at the start of your question. He covers 16-bit, 32-bit, and 64-bit in detail.Monet
@Zboson: thanks! I wanted to keep this brief and compact, so a good external resource to cite is exactly what I want. What do you think of my recent edit? You did convince me that it would be appropriate to not gloss over the RIP-relative issue as much as I was before.Synovitis
I think your new answer is better. I do agree that RIP relative address is better in general. But it would be impossible to get the idea case in my question here without absolute 32-bit addressing.Monet
@PeterCordes - your description is good, as is Anger's guide - but neither really covers what sizes of registers can be used in the various modes (32-bit, 64-bit) - in particular, how the address size prefix is used. Currently your answer doesn't really cover it at all (since it focuses in a mostly combined way on 32-bit and 64-bit, with some mentions of the more interesting differences like the RIP-relative stuff). Agner's guide doesn't even seem to mention the possibility - for 64-bit, he mentions only "32-bit absolute addresses", but that's only the offset - the register is 64-bit.Grandparent
@BeeOnRope: Right, I was assuming the default address size. Adding that line about registers being the same size as the address-size opened the door to discussing that, too, I guess. I guess if you write a function that takes a 32-bit integer that you want to use as an index into a static table, you can save an instruction to zero or sign extend it if you just use a 32-bit address-size. It's pretty obscure, since you can't use it with arbitrary pointers except in the x32 ABI (ILP32 in long mode).Synovitis
Yeah, most of the uses seem fairly obscure (it could also come in handy with lea, outside of actual address calculations). My original query was exactly to use the "implicit zero extend" behavior, which is even more powerful with the byte registers since you have both the l and the h varieties. So if you have rax which conceptually contains 8 byte-sized offsets, you could use both ah and al to pick off two offsets "for free" in the addressing calculation, then shift right by 16 bits. Better than a bunch of mov ebx, eax then and ebx, 0xFF and so on.Grandparent
Other potential uses could be if you want the 32-bit overflow/wrap behavior for the address calculation: this could come in handy in various JIT/interpreter etc type scenarios where you have allocated a 4GB region and want to keep all accesses in-bounds without explicit bounds checking (similar thing applies to 16-bit regs and 64K regions). Anyway, I updated this reply with a little note on it, but shortish since it's kind of obscure as you point out.Grandparent
@BeeOnRope: It's never useful with LEA. Just use the default operand-size to truncate the result to 32-bit instead of using operand-size and address-size prefixes. The low 32 bits of 64-bit adds and left shifts don't depend on high bits, because the carry and shift propagate right to left. Agner Fog's objconv disassembler even makes note of redundant address-size prefixes on 32-bit operand-size LEAs. Interesting point with JIT, though.Synovitis
@BeeOnRope: I ended up rewriting the address-size paragraph. I didn't think cataloguing all the variations was useful, since changing the address-size is so rarely useful that I'd rather just link to your question for the details on that. Does my new text look like it would have told you what you wanted to know if you'd found it originally?Synovitis
@BeeOnRope: re: unpacking byte indices: the optimal way to do it is with movzx ecx, al / movzx edx, ah / shr rax, 16. I actually looked into this while tuning a GaloisField16 function for the reed-solomon error correction codes used by par2. (gcc.gnu.org/bugzilla/show_bug.cgi?id=67072 and stackoverflow.com/questions/31734263/…). Fun fact: IvyBridge can do mov-elimination on movzx reg,reg, but Haswell can't. They decided it was overkill, esp. with Haswell's extra ALU port I guess.Synovitis
@PeterCordes - I wouldn't say 32-bit lea is never useful. If I understand you correctly, you mean "Just use the default operand-size to truncate the result to 32-bit ... [in a subsequent operation that uses the result of lea]", but that isn't always possible. For example, the subsequent operation might be a 64-bit operation. That's not hypothetical: I've seen it happen because (a) you wanted to use the implicit truncation to 32-bit to implement an & 0xFFFFFFFFF type operation, and also (b) when the result was in one leg of a branch and the other leg produces a 64-bit valid result.Grandparent
@BeeOnRope: No, I mean use LEA with its default operand size (32-bit), to write a 32-bit result zero-extended to 64. e.g. (bad) lea rax, [ecx + ebx*4] always gives the same result as (good) lea eax, [rcx + rbx*4], but takes two extra prefix bytes. 32-bit address-size for LEA is never useful because you can always get the same result without it. High bits of inputs registers can't affect the low bits of the result for addition or left shift.Synovitis
Oh right, duh - somehow I thought the default operand size for lea was also 64-bit, like the address size and that you were saying something else. So I guess one can say that of the 4 possible size combinations of lea, the lea eax, [rbx + ...] and lea rax [rbx +...] forms are both useful and the other two forms are quite useless (except for the always-esoteric option of purposely using longer instructions for alignment purposes).Grandparent
@BeeOnRope: right. Having LEA's default operand-size be 64-bit would probably be a win for code density, but would probably cost transistors (and power?) in the decoders. Maybe not much if there was no way to toggle it back to 32-bit, though. And you could still truncate the result by using an address-size prefix. Unless there's something expensive about the address-size prefix (in AMD's k8 microarchitecture which they were designing AMD64 for)... I forget.Synovitis
Yeah, I guess it is consistent and inconsistent at the same time :)Grandparent
Thanks for such a wonderful answer! one question please: is segment-override also possible for pc-relative address? I know it would be a bit pointless but I want to know if it's allowedHawley
@ZhaniBaramidze: yes. [fs: 0] can be encoded with a 32-bit absolute mode with a disp32=0, or (1 byte shorter) with RIP+rel32 where the rel32 is chosen so that RIP+rel32=0. Using relative addressing to generate absolute addresses only works in position-dependent code, because the code address has to be a known constant (known at link time). And has to be within 2GiB of the desired absolute address.Synovitis
AT&T syntax shall dieMillsap
T
1

Here is a quick cheatsheet, retrieved from this site. It shows the various methods available for addressing main memory in x86 assembly:

+------------------------+----------------------------+-----------------------------+
| Mode                   | Intel                      | AT&T                        |
+------------------------+----------------------------+-----------------------------+
| Absolute               | MOV EAX, [0100]            | movl           0x0100, %eax |
| Register               | MOV EAX, [ESI]             | movl           (%esi), %eax |
| Reg + Off              | MOV EAX, [EBP-8]           | movl         -8(%ebp), %eax |
| Reg*Scale + Off        | MOV EAX, [EBX*4 + 0100]    | movl   0x100(,%ebx,4), %eax |
| Base + Reg*Scale + Off | MOV EAX, [EDX + EBX*4 + 8] | movl 0x8(%edx,%ebx,4), %eax |
+------------------------+----------------------------+-----------------------------+

In your specific case, if the item is located at an offset of 4 from the stack base EBP, you would use the Reg + Off notation:

MOV EAX, [ EBP - 4 ]

This would copy the item into register EAX.

Tuppence answered 12/8, 2019 at 2:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.