How does the xchg
instruction work in the following code? It is given that arrayD is a DWORD array of 1,2,3.
mov eax, arrayD ; eax=1
xchg eax, [arrayD+4] ; eax=2 arrayD=2,1,3
Why isn't the array 1,1,3 after the xchg
?
How does the xchg
instruction work in the following code? It is given that arrayD is a DWORD array of 1,2,3.
mov eax, arrayD ; eax=1
xchg eax, [arrayD+4] ; eax=2 arrayD=2,1,3
Why isn't the array 1,1,3 after the xchg
?
xchg
works like Intel's documentation says.
I think the comment on the 2nd line is wrong. It should be eax=2
, arrayD = 1,1,3
. So you're correct, and you should email your instructor to say you think you've found a mistake, unless you missed something in your notes.
xchg
only stores one element, and it can't magically look back in time to know where the value in eax came from and swap two memory locations with one xchg
instruction.
The only way to swap 1,2
to 2,1
in one instruction would be a 64-bit rotate, like rol qword ptr [arrayD], 32
(x86-64 only).
BTW, don't use xchg
with a memory operand if you care about performance. It has an implicit lock
prefix on 386 and later, so it's a full memory barrier, and even apart from waiting for the store buffer to drain, it takes about 20 CPU cycles on Haswell/Skylake (http://agner.org/optimize/ and https://uops.info/). Of course, multiple instructions can be in flight at once, but xchg mem,reg
is 8 uops, vs. 2 total for separate load + store. xchg
doesn't stall the pipeline, but the memory barrier hurts a lot (stopping later loads from being started early as well as waiting for earlier loads and stores to fully complete). It's also a lot of work for the CPU to do to make it atomic.
Related:
xchg
is only useful for this case if you need atomicity, or if you care about code-size but not speed. Or on CPUs before 386, where xchg
doesn't imply lock
.xchg reg,reg
, no memory barrier)mfence
vs. a lock
ed operation© 2022 - 2024 — McMap. All rights reserved.
mov eax, arrayD
does NOT seteax
to1
. It loads the address ofarrayD
. What you want ismov eax, [arrayD]
. Edit:misread the initial state. – WhaleboatarrayD
and[arrayD]
the same. – Laktasic.intel_syntax
? If it's GAS, thenmov eax, arrayD
is in fact a load, but;
is not the comment character! Is it maybe MASM syntax? I think[arrayD+4]
might be legal MASM syntax, even though many people writearrayD[4]
orarrayD+4
with symbols outside square brackets in MASM. – Froh1,2,3
to3,1,2
is more than one swap. – Whaleboatxchg
at all; it's slow with a memory operand because it does an atomic exchange (implicitlock
prefix. See also agner.org/optimize). For1,2,3
->3,1,2
, I'd load all 3 values into eax,ecx, and edx, then store them back. Or do a 64-bit load of the first 2. e.g.mov rax, qword ptr [arrayD]
/mov edx, [arrayD+8]
/mov [arrayD], edx
/mov qword ptr [arrayD+4], rax
. Assuming you're using x86-64. If you're on 32-bit, you can usemovq
into XMM0. – Frohmovdqu xmm0, [arrayD]
/pshufd xmm0,xmm0, _MM_SHUFFLE(4,2,1,3)
/movdqu [arrayD], xmm0
. Usevpmaskmovd
or AVX512 masked load/store if you need to avoid load/store of the dword past the end of the array. – Frohmov eax,[ebx]
. The MASM will ignore the[]
around symbol names, so you can writemov eax,[arrayD]
in such case. ... (and about +4 .. are you aware the memory is addressable by single bytes, so 32 bit value occupies 4 bytes in memory = the first element of that array occupies addressesarrayD+0
,arrayD+1
,arrayD+2
andarrayD+3
. The second element starts at addressarrayD+4
(and occupies mem up to +7) – Heavyweight_MM_SHUFFLE(3,1,0,2)
– Froh