The upper half is different, as mentioned in the comments. If you don't care about the upper half, you can use either mul
or imul
, in all of their forms (the one-operand forms produce the upper half, but in this scenario you would ignore it).
If you do care about the upper half, neither mul
nor imul
works by itself, since they just multiply unsigned*unsigned and signed*signed, but you can fix it fairly easily.
Consider that a signed byte has the bit-weights -128, 64, 32, 16, 8, 4, 2, 1 while an unsigned byte has the bit-weights +128, 64, 32, 16, 8, 4, 2, 1. So you can represent the unsigned value of x
in signed format (I know this is confusing but it's the best I can do) as x + 256 x_7
(where x_7
is bit 7 of x
). The easiest way to see is probably to split it: x + 2 * 128 * x_7
. What's happening here is compensating for the -128 weight, first removing it by adding the value of bit 7 128 times and then going all the way up to the +128 weight by doing it again, of course this can be done in one step.
Anyway, multiplying that by some signed number y
and working it out gives 256 x_7 y + xy
, where xy
is the (double-width) result of imul
and 256 x_7 y
means "add y
to the upper half if the sign of x
is set", so a possible implementation is (not tested)
; al has some unsigned value
mov dl, al
sar dl, 7
and dl, [signedByte]
imul BYTE [signedByte]
add ah, dl
Naturally you could sign-extend one operand, zero-extend the other, and use a 16 bit multiplication (any, since the upper half is not relevant this way).
mul
should produce502
as result inax
. – Servitorax
is0x01F6
(502
) when usingmul
, and0xFFF6
(-10
) when usingimul
. – Everardimul reg, r/m32
orimul reg, r/m32, imm
if you don't need the high-half result; it's more efficient on modern CPUs (1 uop) because it doesn't have to write the high half anywhere. agner.org/optimize – Beet