Valid address not required
prefetch
/ prefetchw
and nop
as mentioned in other answers.
Any AVX512 masked load or store with an all-zero mask like vmovaps [rdi]{k1}, zmm1
. Or AVX vmaskmovps
/ vpmaskmovd
. AVX2 gather / AVX512 gather or scatter with an all zero mask. These all do fault-suppression for invalid addresses. (Slow, but no actual memory access.)
invlpg m8
takes a ModRM which specifies a virtual address. (Privileged instruction). Instead of loading from that address, it invalidates the TLB entry for that address, and higher-level page directory entries cached in the page walker(s).
verr
/ verw
— Verify a Segment for Reading or Writing: they take a ModRM addressing mode and check the address against segment limits, setting FLAGS. (And with recent microcode updates. verw
also clears internal CPU buffers so OSes can use it to mitigate L1TF / MDS vulnerabilities).
rep cmpsb
or other string instruction with RCX = 0 do zero iterations, not accessing the [RDI]
or [RSI]
implicit memory operands. I think this means it won't fault even with a bad address. The microcode is certainly slow enough
cldemote
(new in Intel Tremont) - the opposite of a prefetch; performance hint to push data out towards shared L3 to speed up the first access from another core. It decodes as a NOP on HW without that feature. Prefetches don't fault on invalid addresses (although they can be slow when they take a microcode assist to suppress the fault); the manual isn't 100% clear one way or the other for cldemote
, but does call it a speculative hint.
In certain processor implementations the CLDEMOTE instruction may set the A
bit but not the D
bit in the page tables.
If the line is not found in the cache, the instruction will be treated as a NOP.
MPX bndcl bnd, r/m64
/ bndcu
/ bndcn
/ bndmk
- The memory-source form has a built-in LEA: the Operation section's pseudocode even says TEMP ← LEA(mem);
. The register-source form just uses the register value as an address directly. As the manual says, This instruction does not cause any memory access, and does not read or write any flags. (It raises a #BR
exception on out-of-bounds). Note that MPX is deprecated.
Valid address required but not a load or store per se.
clflush
/ clflushopt
/ clwb
all take a memory operand to specify which cache line to flush or write back all the way to DRAM, usable with non-volatile DIMMs to ensure commit to NV storage (so unlike cldemote
, these are not just hints the CPU can drop if busy or address not found). They do require a valid virtual address, and do affect the MESI state of the corresponding cache line. But if the cache line isn't present in L1d cache, it's not brought in and then flushed again. I think it evicts from the caches of all cores, so one core spamming clflush
on a line will affect another core reading/writing it.
The CLFLUSHOPT instruction can be used at all privilege levels and is subject to all permission checking and faults associated with a byte load (and in addition, a CLFLUSHOPT instruction is allowed to flush a linear address in an execute-only segment). Like a load, the CLFLUSHOPT instruction sets the A bit but not the D bit in the page tables.
MONITOR
takes a memory address as implicit DS:RAX/EAX/AX
, not encoded in a ModRM. It doesn't actually load from it, instead just sets up the core to notice when another core changes that memory. However, it does work like a load. (Presumably getting the line into MESI Shared state so it can notice when another core invalidates it before a write.)
The MONITOR instruction is ordered as a load operation with respect to other memory transactions. The instruction is subject to the permission checking and faults associated with a byte load. Like a load, MONITOR sets the A-bit but not the D-bit in page tables.
umonitor
(the user-space version) is the same.
bt/btc/btr/bts
which do access memory, but not necessarily at the effective address specified in the instruction. – Scanties