How can I set or clear overflow flag in x86 assembly?
Asked Answered
L

3

7

I want to write a simple code (or algorithm) to set/clear overflow flag. For setting OF, I know that I can use signed values. But how can I clear that?

Lucey answered 22/4, 2016 at 16:5 Comment(1)
perform an operation that never would create an overflow? e.g. mov ax, 1; inc axGarlan
M
3

There are many possible solutions.

For instance, test al, al will clear the OF flag without affecting register contents.


Or, if you don't want to affect the other flags, you can just directly modify the *FLAGS register. For example, in 32-bit, this would look like:

pushfd                   ; Push EFLAGS onto the stack
and dword [esp], ~0x800  ; Clear bit 11 (OF)
popfd                    ; Pop the modified result back into EFLAGS

Edit: Changed or al, al to test al, al per Peter Cordes' recommendation. (The effects are the same but the latter is better for performance reasons)

Mudslinger answered 22/4, 2016 at 17:5 Comment(2)
test al,al is a better idiom for setting flags according to the contents of a register. It only writes flags, so it doesn't increase the length of the dependency chain involving al. +1 for the pushf/popf method, though. (Since sahf/lahf doesn't include OF).Cockleshell
Good point about or and test; I updated my answer accordingly. Thanks!Mudslinger
C
3

Provided:

  • you have a register whose content you don't care about,
  • you must to preserve CF-Flag

Best solution to clear OF (sar):

Say register is al. (setc only with byte registers r/8)

; clear OF-Flag, preserve CF
setc al
sar al, 1

Note: this is good because it has no partial flag updates, which may cause stalls. (sar xx, 1 writes all flags, not leaving any unmodified, unlike inc/dec) c.f. Intel Optimization Guide, 3.5.2.6: Partial Flag Register Stalls, but note that modern Intel CPUs don't have partial-flag stalls or flag-merging at all: instructions that read FLAGS just read either or both of CF or the SPAZO group as 2 separate inputs. (That's why cmovbe is still 2 uops on Broadwell and later: it needs CF and ZF. https://uops.info/)

Source:Intel Documentation SAR p.1234

General solution (inc/dec):

Say register is al. (works with r/8, r/16, r/32, r/64)

; set OF-Flag, preserve CF
mov al, 0x7F
inc al

; clear OF-Flag, preserve CF
mov al, 0x0
inc al

Source:Intel Documentation INC p.551

Alternatively (adox):

Different apporach, if you can assume:

  • an adx enabled processor (You check cpu flags with grep adx /proc/cpuinfo)

Say register is eax. (need r64/r32)

; clear OF-Flag, preserve CF
mov eax, 0x0
adox eax, eax

; set OF-Flag, preserve CF
mov eax, 0xFFFFFFFF
adox eax, eax 

Note: Don't try to replace mov with xor (or similar) since that will clear CF

Source:Intel Documentation ADOX p.150

Chronograph answered 17/7, 2020 at 2:27 Comment(3)
You need the register not to be -1 beforehand. 0xFFFF... + 1 wraps, producing carry-out (which ADOX will output as OF=1). But yes, if you set a different value beforehand, that's an interesting way to set/clear just OF without touching other flags.Cockleshell
good point. also adox cannot handle immediate values anyway. I edited the code.Chronograph
Modern Intel CPUs don't have partial-flag stalls or even partial-flag merging uops; see @Bee's answer on What is a Partial Flag Stall?. I edited your answer to include that, since you're discussing optimization.Cockleshell
C
2

popf is quite slow (like one per 20 cycles on Skylake); if you need to clear or set OF then ideally do it as a side-effect of an ALU instruction, especially one you were going to use anyway for a useful computation you know won't or will overflow. (One that will overflow is usually harder to find, unlike for CF where you can always just sub instead of add with a constant that wraps almost all the way around for all inputs except a very small range).

If you need to set/clear just OF without affecting other condition-codes for some reason, then yes, pushf/popf is the way to go. lahf / sahf doesn't get OF, because OF is bit 11 in EFLAGS, outside the low 8.


test al,al (or any same,same register) clears OF and CF, just like comparing / subtracting zero. Other flags are usefully set according to the value.

xor eax,eax clears EAX, and clears OF/SF/CF, sets ZF/PF. You often need a zeroed register anyway, so if you need OF clear (e.g. for the start of an adox extended-precision chain), then kill 2 birds with one stone and arrange your code so the last flag-setting instruction is the xor-zeroing.

In x86-64, you can also trust that using add on a pointer + length doesn't cross over the middle of unsigned virtual address space, and thus clears OF. But that assumption could break on future CPUs with fully 64-bit virtual addresses, because then there'd be no hole in virtual address space around the signed-wraparound boundary, so a single contiguous array could span it. And that can already happen in 32-bit code, running under a 64-bit kernel or a 32-bit kernel that doesn't use a 2G:2G kernel:user split of virtual address space.


xor eax, eax / cmp al, -128 sets OF, and only takes 4 bytes of code. It's is probably the cheapest way, and unlike sub or whatever, it doesn't write any partial registers (or any full registers). It still leaves EAX zeroed.

0 - -128 wraps to -128, i.e. signed OF. An 8-bit 2's complement integer can only represent values from -128..+127. The most-negative number is a special case, and has no proper inverse. It's its own absolute value / negative, or more properly those functions overflow. (Or you could treat the absolute value operation as having signed input and unsigned output, so the result is +128, i.e. 0x80. x86 doesn't have an integer abs instruction (prepare a -x, then test/cmov), but with SSSE3 it does have vector integer pabsb)

For any known value in AL other than -1, there's a cmp al, imm8 that will set OF. For any value from 0..127, cmp al, -128 wraps. For any value from -2..-128, cmp al, +127 wraps and thus sets OF. For -1, subtracting 127 will only take you to -128. Subtracting -128 takes you up to +127. Unfortunately I don't think there's a single-instruction way to set OF without a known value in a register.

It doesn't have to be al, but there's a 2-byte special encoding of cmp al,imm8. Other 8 or 32-bit registers can use the normal 3-byte encoding.


Without clobbering any registers, and no known constants, this is 6 bytes:

push   rax
xor    eax,eax
cmp    al, -128
pop    rax

This does clobber the other condition codes, but it's faster than pushf/popf. Normally you can clobber something, though, or else you can't clobber the stack.


Toggle OF

setno al              # OF=0 -> AL=1           OF=1 -> AL=0
cmp   al, -127        # 1 - -127 = 128 = -128     0 - -127 = +127
Cockleshell answered 27/3, 2018 at 3:35 Comment(3)
what's the point of lahf and sahf when it doesn't affect all the flags? why do those instructions exist?Mogador
@LưuVĩnhPhúc: I don't understand the 8086 design decision to put OF outside the flags affected by LAHF/SAHF, especially since there are still reserved bits in the low 8 of FLAGS. I think there was some intent to make it easy / mechanical to port 8080 asm source to 8086, so this may explain it. (You or I should ask on retrocomputing about this design decision) One major use of LAHF is to set condition codes from the high byte of the x87 FP status word, e.g. fstsw [bp-2] / pop ax / lahf. Or on 286 and later fstsw ax/lahfCockleshell
@LưuVĩnhPhúc: Oops, that's fstsw ax / SAHF (store AH into FLAGS) / ja or jb or jp. (The LAHF/SAHF mnemonics seem backwards to me; written in that order, my brain thinks load AH into FLAGS, not from FLAGS). Anyway, this is why the more recent fucomi and SSE ucomiss / ucomisd instructions set flags like an unsigned compare, with PF set on unordered (NaN); for compat with how the legacy method lined up the C3/C2/C0 bits in the x87 status word with bits in FLAGS.Cockleshell

© 2022 - 2024 — McMap. All rights reserved.