Is LEA the only instruction in x86 with a memory operand that doesn't access memory?
Asked Answered
C

3

6

I'm using libdis, the x86 disassembler library from the bastard, and I'm trying to find out which instructions access memory.

With reference to these two instructions:

mov eax, [ebx + 10]
lea eax, [ebx + 10]

In libdis, both are listed with instruction type insn_mov, and the address operands have the same flags in both cases. So the only way I can tell if memory is accessed is to look at the instruction mnemonic.

Hence my question: is LEA the only instruction using a memory operand that doesn't actually access memory? Any links to references would be nice.

Comfortable answered 16/12, 2013 at 23:26 Comment(2)
I think so... not sure though.Hummock
Honorable mention to bt/btc/btr/bts which do access memory, but not necessarily at the effective address specified in the instruction.Scanties
Z
11

The prefetch family of instructions (prefetcht1, prefetcht2, prefetcht3, prefetchnta) ask the processor to go and pull those memory lines into cache because they will be needed soon. But, Intel's documentation makes it clear that no faults can result from a bad address passed to prefetch. This is so that software can pass potentially out-of-bound addresses to prefetch without checking them first, so that the data can be in-flight while those checks are performed.

Prefetches also don't have an 'output', unlike LEA.

Zipper answered 16/12, 2013 at 23:51 Comment(0)
E
9

Intel has a multi-byte "NOP" instruction with opcode 0F 1F /0 that takes memory addressing operands. From Intel's manual:

The multi-byte NOP instruction does not alter the content of a register and will not issue a memory operation

The discussion in comments is about putting the opcode byte of a nop at the end of an unmapped page, and code fetch faulting if it can't read a complete instruction including the ModR/M and displacement bytes. That's orthogonal to this question.


You can think of long-NOP as working as follows:

  • Instruction decoding hardware knows how to find the end of an instruction that takes a ModRM (which implies optional SIB and/or displacement).
  • Based on that opcode specifically being a NOP, the rest of the CPU doesn't do anything with the addressing mode encoded by the ModRM.

This makes it possible for software to encode multi-byte NOPs by using a variety of addressing modes and prefixes. The CPU can be designed to handle it without needing any special hardware beyond recognizing one more opcode as a nop. The overall instruction format is the same as most.

Erection answered 17/12, 2013 at 0:24 Comment(7)
It seems they do access memoty according to Peter Ferrie: "Interestingly, despite its name, it does access memory if the Mod/RM byte tells it to, so this "No OPeration" can cause page faults. Not quite a NOP after all."Clairvoyance
Really? Hmm. I'll have to go back and revisit that code; memory touches can be slow if not in the cache.Erection
I seem to get a different story: mail.openjdk.java.net/pipermail/hotspot-compiler-dev/… My code generator produces the suggested code, involving EAX, in a context in which EAX can be arbitrary values. If these really read memory, I'd get traps, but I've had zero trouble with this. One of Ferrie or Dabbs is wrong.Erection
This is related to how Mod/RM is decoded. Place it at the end of a page, such that it spans into the next page, such as by using an offset. Make the next page inaccessible. Now attempt to execute it. Instruction fetch will cause a page fault, even though the CPU ought to "know" that it's a nop and the offset is irrelevant to its execution.Morganne
@peter ferrie: That just means that the CPU has rules about fetching entire instructions before executing them, which is a orthogonal topic. The question was, if the NOP attempted to fetch the address formed by the mod/rm byte. Do you have specific evidence other than the example you just gave?Erection
@IgorSkochinsky: Intel's manuals fortunately clear up any doubt introduced by that misleading quote. I edited Ira's answer to clarify.Craver
@PeterCordes: Tip of the hat.Erection
C
4

Valid address not required

prefetch / prefetchw and nop as mentioned in other answers.

Any AVX512 masked load or store with an all-zero mask like vmovaps [rdi]{k1}, zmm1. Or AVX vmaskmovps / vpmaskmovd. AVX2 gather / AVX512 gather or scatter with an all zero mask. These all do fault-suppression for invalid addresses. (Slow, but no actual memory access.)

invlpg m8 takes a ModRM which specifies a virtual address. (Privileged instruction). Instead of loading from that address, it invalidates the TLB entry for that address, and higher-level page directory entries cached in the page walker(s).

verr / verwVerify a Segment for Reading or Writing: they take a ModRM addressing mode and check the address against segment limits, setting FLAGS. (And with recent microcode updates. verw also clears internal CPU buffers so OSes can use it to mitigate L1TF / MDS vulnerabilities).

rep cmpsb or other string instruction with RCX = 0 do zero iterations, not accessing the [RDI] or [RSI] implicit memory operands. I think this means it won't fault even with a bad address. The microcode is certainly slow enough

cldemote (new in Intel Tremont) - the opposite of a prefetch; performance hint to push data out towards shared L3 to speed up the first access from another core. It decodes as a NOP on HW without that feature. Prefetches don't fault on invalid addresses (although they can be slow when they take a microcode assist to suppress the fault); the manual isn't 100% clear one way or the other for cldemote, but does call it a speculative hint.

In certain processor implementations the CLDEMOTE instruction may set the A bit but not the D bit in the page tables.

If the line is not found in the cache, the instruction will be treated as a NOP.

MPX bndcl bnd, r/m64 / bndcu / bndcn / bndmk - The memory-source form has a built-in LEA: the Operation section's pseudocode even says TEMP ← LEA(mem);. The register-source form just uses the register value as an address directly. As the manual says, This instruction does not cause any memory access, and does not read or write any flags. (It raises a #BR exception on out-of-bounds). Note that MPX is deprecated.


Valid address required but not a load or store per se.

clflush / clflushopt / clwb all take a memory operand to specify which cache line to flush or write back all the way to DRAM, usable with non-volatile DIMMs to ensure commit to NV storage (so unlike cldemote, these are not just hints the CPU can drop if busy or address not found). They do require a valid virtual address, and do affect the MESI state of the corresponding cache line. But if the cache line isn't present in L1d cache, it's not brought in and then flushed again. I think it evicts from the caches of all cores, so one core spamming clflush on a line will affect another core reading/writing it.

The CLFLUSHOPT instruction can be used at all privilege levels and is subject to all permission checking and faults associated with a byte load (and in addition, a CLFLUSHOPT instruction is allowed to flush a linear address in an execute-only segment). Like a load, the CLFLUSHOPT instruction sets the A bit but not the D bit in the page tables.

MONITOR takes a memory address as implicit DS:RAX/EAX/AX, not encoded in a ModRM. It doesn't actually load from it, instead just sets up the core to notice when another core changes that memory. However, it does work like a load. (Presumably getting the line into MESI Shared state so it can notice when another core invalidates it before a write.)

The MONITOR instruction is ordered as a load operation with respect to other memory transactions. The instruction is subject to the permission checking and faults associated with a byte load. Like a load, MONITOR sets the A-bit but not the D-bit in page tables.

umonitor (the user-space version) is the same.

Craver answered 9/5, 2020 at 14:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.