During an interview I was asked if I knew x64 instructions that behave differently depending on the CPU used, I couldn't find any documentation on that anywhere, does anyone know what these instructions are and why this is the case?
There are some that leave a register or some flags with undefined values. Intel and AMD may differ there.
In some cases, the actual behaviour of real hardware for these undefined cases preserves backwards compatibility for some old software that relies on it. For example, BSF
with input=0 sets ZF and leaves the destination register unmodified. (On both current Intel and AMD hardware. IDK if any old Intel hardware was ever different, if no, bsf
/bsr
isn't really an example of an instruction that executes differently, just a lack of documented guarantees of being future-proof.)
But the difference is that Intel documents it as leaving the destination register with "undefined" contents. AMD's manuals explicitly document and guarantee that AMD CPUs will leave the destination unmodified in that case.
AMD's AMD64 manual (March 2017) for
bsr
/bsf
:
If the second operand contains 0, the instruction sets ZF to 1 and does not change the contents of the destination register
So it's not guaranteed on paper that it's safe to emulate tzcnt / implement std::countr_zero
as mov eax, 32
/ bsf eax, edx
, even though that works in practice and will likely continue working on future CPUs. (This is why bsf
/ bsr
have an output dependency.) Intel might eventually document this behaviour, in which case compilers will be able to use it for a more efficient countr_zero
/ countl_zero
without BMI1. Intel did recently document that AVX implied 16-byte aligned loads / stores were atomic on Intel CPUs, so it's not unprecedented for a vendor to document something that their CPUs have been doing for years.
If performance differences count, there are many (see links in the x86 tag wiki)!
You're not just talking about unsupported instructions, are you? Like LAHF/SAHF being unsupported in long mode on some very early x86-64 CPUs? Or CMPXCHG16B also unsupported on early K8.
The most interesting case of unsupported instructions is that LZCNT decodes as BSR on CPUs that don't support it, the REP prefix being ignored. Even for non-zero inputs, they return opposite results. (_lzcnt_u32(x) == 31-bsr(x)
). TZCNT similarly decodes as (REP) BSF on CPUs that don't support it, but they do the same thing except when input = 0. I didn't mention this earlier, because running the same machine-code differently is not the same thing as running the same instruction differently, but it sounds like this is the kind of thing you're asking for.
Are we talking only about un-privileged instructions? There are probably many more differences in the behaviour of privileged instructions. For example, Intel and AMD both have different bugs in SYSRET that Linux has to work around to avoid malicious user-space being able to cause a kernel hang.
Another case that I'm not sure counts: PREFETCHW runs on Intel CPUs from at least Core2 to Haswell as a NOP, but on AMD CPUs (and Intel since Broadwell) as an actual prefetch.
So some CPUs run it as a NOP, some run it as a prefetch (thus no architecturally visible effect either way), except on ancient CPUs where it faults as an illegal insn. 64-bit Windows8.1 apparently requires that PREFETCHW can run without faulting (which stops it from running on (some?) 64-bit Pentium4 CPUs).
cpuid
, you could chip in that tzcnt
executes as rep bsf
and lzcnt
as rep bsr
on CPUs < Haswell because f3
in f3 0f bc /r
is interpreted as rep
on those CPUs, thus behaving architecturally differently. Those are the three instances I know of architecturally-different behaviour on x64; The co-opting of f2/f3
for the xacquire/xrelease
HLE prefixes arguably also counts. –
Sebrinasebum bsf
/bsr
, AMD seems to actually document that the register is unchanged if the input is zero, i.e., aligned with the actual Intel behavior. At least according to this source. –
Putdown LZCNT == 31 - BSR
instead of 32-BSR
? –
Jural There are a lot of differences between Intel and AMD
- Intel 64's
BSF
andBSR
instructions act differently than AMD64's when the source is zero and the operand size is 32 bits. The processor sets the zero flag and leaves the upper 32 bits of the destination undefined.- Intel 64 lacks some
MSR
s that are considered architectural in AMD64. These includeSYSCFG
,TOP_MEM
, andTOP_MEM2
.- Intel 64 allows
SYSCALL
/SYSRET
only in 64-bit mode (not in compatibility mode),[33] and allowsSYSENTER
/SYSEXIT
in both modes.[34] AMD64 lacksSYSENTER
/SYSEXIT
in both sub-modes of long mode.[35]- In 64-bit mode, near branches with the 66H (operand size override) prefix behave differently. Intel 64 ignores this prefix: the instruction has 32-bit sign extended offset, and instruction pointer is not truncated. AMD64 uses 16-bit offset field in the instruction, and clears the top 48 bits of instruction pointer.
- AMD processors raise a floating point Invalid Exception when performing an
FLD
orFSTP
of an 80-bit signalling NaN, while Intel processors do not.- When returning to a non-canonical address using
SYSRET
, AMD64 processors execute the general protection fault handler in privilege level 3, while on Intel 64 processors it is executed in privilege level 0.[38][39]
https://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
© 2022 - 2024 — McMap. All rights reserved.
cpuid
is a dead giveaway, but on P4 it controversially exposed the serial number. – Sebrinasebum