To multiply a number by any any multiple of 2, I'll shift it those many times.
Is there any such technique to multiply a number by 10 in less cycles?
To multiply a number by any any multiple of 2, I'll shift it those many times.
Is there any such technique to multiply a number by 10 in less cycles?
The 80286 did not have a barrel shifter, that was introduced with the 80386. According to the timing tables in the Microsoft Macro Assembler 5.0 documentation (1987), SHL reg, immed8 takes 5+n cycles, whereas SHL reg, 1 takes 2 cycles. ADD reg, reg takes 2 cycles, as does MOV reg, reg. IMUL reg16, immed takes 21 cycles. Therefore, the fastest way to multiply by ten would appear to be:
; // cycles
shl ax, 1 ; *2 // 2
mov bx, ax ; *2 // 4
shl ax, 1 ; *4 // 6
shl ax, 1 ; *8 // 8
add ax, bx ; *10 // 10
or, alternatively:
; // cycles
mov bx, ax ; *1 // 2
shl ax, 1 ; *2 // 4
shl ax, 1 ; *4 // 6
add ax, bx ; *5 // 8
shl ax, 1 ; *10 // 10
Ten cycles either way.
© 2022 - 2024 — McMap. All rights reserved.
imul reg,reg,10
is slow, and 32-bit addressing modes likelea ax, [eax + eax*4]
aren't available for cheapx * 5
? Do you care about performance of the code on any later or earlier CPUs, in case something that's optimal for 286 isn't optimal elsewhere? Do you have a link for 80286 instruction timings? – Alby10*x = (4*x + x) * 2 = ((x << 2) + x) << 1
. This is the same way you do "long multiplication" by hand. – Burbotmov bx, ax ; shl ax, 2 ; add ax, bx ; shl ax, 1
. – Burbotadd same,same
faster or slower thanshl reg,1
on 286 for that last step? It probably doesn't matter what order you do anything in; 286 can't exploit the ILP inx*2 + x*8
, and I think we need 1mov
. Unless you happened to already have the value in SI|DI and BX|BP, then you couldlea ax, [bx + si]
or something to start withx*2
– Albymul
by a constant is at least a few set bits even on P5 Pentium; 10 only has 2 set bits. On modern Nehalem or later, yes better than 1-operandmul
, but not better thanimul ax, bx, 10
. (3 cycle latency, 1/clock throughput, 1 uop) – Albymov bx, ax ; add ax, ax ; add ax, ax ; add ax, bx ; add ax, ax
. – Burbotgcc -O3 -march=pentium
. Or even-march=i386
. godbolt.org/z/qjD-a3. Oh, you could compile for MIPS to limit GCC to just using shifts and add/sub, not x86 LEA. Or maybe MPS430 as a 2-operand machine. – Alby