I want to write a simple code (or algorithm) to set/clear overflow flag. For setting OF, I know that I can use signed values. But how can I clear that?
There are many possible solutions.
For instance, test al, al
will clear the OF
flag without affecting register contents.
Or, if you don't want to affect the other flags, you can just directly modify the *FLAGS
register. For example, in 32-bit, this would look like:
pushfd ; Push EFLAGS onto the stack
and dword [esp], ~0x800 ; Clear bit 11 (OF)
popfd ; Pop the modified result back into EFLAGS
Edit: Changed or al, al
to test al, al
per Peter Cordes' recommendation. (The effects are the same but the latter is better for performance reasons)
test al,al
is a better idiom for setting flags according to the contents of a register. It only writes flags, so it doesn't increase the length of the dependency chain involving al
. +1 for the pushf
/popf
method, though. (Since sahf
/lahf
doesn't include OF
). –
Cockleshell or
and test
; I updated my answer accordingly. Thanks! –
Mudslinger Provided:
- you have a register whose content you don't care about,
- you must to preserve
CF
-Flag
Best solution to clear OF (sar):
Say register is al
. (setc
only with byte registers r/8)
; clear OF-Flag, preserve CF
setc al
sar al, 1
Note: this is good because it has no partial flag updates, which may cause stalls. (sar xx, 1
writes all flags, not leaving any unmodified, unlike inc
/dec
) c.f. Intel Optimization Guide, 3.5.2.6: Partial Flag Register Stalls, but note that modern Intel CPUs don't have partial-flag stalls or flag-merging at all: instructions that read FLAGS just read either or both of CF or the SPAZO group as 2 separate inputs. (That's why cmovbe
is still 2 uops on Broadwell and later: it needs CF and ZF. https://uops.info/)
Source:Intel Documentation SAR p.1234
General solution (inc/dec):
Say register is al
. (works with r/8, r/16, r/32, r/64)
; set OF-Flag, preserve CF
mov al, 0x7F
inc al
; clear OF-Flag, preserve CF
mov al, 0x0
inc al
Source:Intel Documentation INC p.551
Alternatively (adox):
Different apporach, if you can assume:
- an
adx
enabled processor (You check cpu flags withgrep adx /proc/cpuinfo
)
Say register is eax
. (need r64/r32)
; clear OF-Flag, preserve CF
mov eax, 0x0
adox eax, eax
; set OF-Flag, preserve CF
mov eax, 0xFFFFFFFF
adox eax, eax
Note: Don't try to replace mov
with xor
(or similar) since that will clear CF
-1
beforehand. 0xFFFF...
+ 1 wraps, producing carry-out (which ADOX will output as OF=1). But yes, if you set a different value beforehand, that's an interesting way to set/clear just OF without touching other flags. –
Cockleshell adox
cannot handle immediate values anyway. I edited the code. –
Chronograph popf
is quite slow (like one per 20 cycles on Skylake); if you need to clear or set OF then ideally do it as a side-effect of an ALU instruction, especially one you were going to use anyway for a useful computation you know won't or will overflow. (One that will overflow is usually harder to find, unlike for CF where you can always just sub
instead of add
with a constant that wraps almost all the way around for all inputs except a very small range).
If you need to set/clear just OF without affecting other condition-codes for some reason, then yes, pushf
/popf
is the way to go. lahf
/ sahf
doesn't get OF, because OF is bit 11 in EFLAGS, outside the low 8.
test al,al
(or any same,same register) clears OF and CF, just like comparing / subtracting zero. Other flags are usefully set according to the value.
xor eax,eax
clears EAX, and clears OF/SF/CF, sets ZF/PF. You often need a zeroed register anyway, so if you need OF clear (e.g. for the start of an adox
extended-precision chain), then kill 2 birds with one stone and arrange your code so the last flag-setting instruction is the xor-zeroing.
In x86-64, you can also trust that using add
on a pointer + length doesn't cross over the middle of unsigned virtual address space, and thus clears OF
. But that assumption could break on future CPUs with fully 64-bit virtual addresses, because then there'd be no hole in virtual address space around the signed-wraparound boundary, so a single contiguous array could span it. And that can already happen in 32-bit code, running under a 64-bit kernel or a 32-bit kernel that doesn't use a 2G:2G kernel:user split of virtual address space.
xor eax, eax
/ cmp al, -128
sets OF, and only takes 4 bytes of code. It's is probably the cheapest way, and unlike sub
or whatever, it doesn't write any partial registers (or any full registers). It still leaves EAX zeroed.
0 - -128
wraps to -128
, i.e. signed OF. An 8-bit 2's complement integer can only represent values from -128..+127
. The most-negative number is a special case, and has no proper inverse. It's its own absolute value / negative, or more properly those functions overflow. (Or you could treat the absolute value operation as having signed input and unsigned output, so the result is +128, i.e. 0x80. x86 doesn't have an integer abs instruction (prepare a -x
, then test/cmov), but with SSSE3 it does have vector integer pabsb
)
For any known value in AL other than -1
, there's a cmp al, imm8
that will set OF. For any value from 0..127, cmp al, -128
wraps. For any value from -2..-128, cmp al, +127
wraps and thus sets OF. For -1
, subtracting 127 will only take you to -128. Subtracting -128 takes you up to +127. Unfortunately I don't think there's a single-instruction way to set OF without a known value in a register.
It doesn't have to be al
, but there's a 2-byte special encoding of cmp al,imm8
. Other 8 or 32-bit registers can use the normal 3-byte encoding.
Without clobbering any registers, and no known constants, this is 6 bytes:
push rax
xor eax,eax
cmp al, -128
pop rax
This does clobber the other condition codes, but it's faster than pushf
/popf
. Normally you can clobber something, though, or else you can't clobber the stack.
Toggle OF
setno al # OF=0 -> AL=1 OF=1 -> AL=0
cmp al, -127 # 1 - -127 = 128 = -128 0 - -127 = +127
fstsw [bp-2]
/ pop ax
/ lahf
. Or on 286 and later fstsw ax
/lahf
–
Cockleshell fstsw ax
/ SAHF
(store AH into FLAGS) / ja
or jb
or jp
. (The LAHF/SAHF mnemonics seem backwards to me; written in that order, my brain thinks load AH into FLAGS, not from FLAGS). Anyway, this is why the more recent fucomi
and SSE ucomiss
/ ucomisd
instructions set flags like an unsigned compare, with PF set on unordered (NaN); for compat with how the legacy method lined up the C3/C2/C0 bits in the x87 status word with bits in FLAGS. –
Cockleshell © 2022 - 2024 — McMap. All rights reserved.
mov ax, 1; inc ax
– Garlan