Why does int addition though pointers take one less x86 instruction than int multiplication through pointers?
Asked Answered
S

1

5

I have the following C/C++ code (compiler explorer link):

void update_mul(int *x, int *amount) { 
    *x *= *amount; 
}

void update_add(int *x, int *amount) { 
    *x += *amount; 
}

Under both clang and gcc compiling as C or as C++ with at least -O1 enabled, the above translates to this assembly:

update_mul:                             # @update_mul
        mov     eax, dword ptr [rdi]
        imul    eax, dword ptr [rsi]
        mov     dword ptr [rdi], eax
        ret
update_add:                             # @update_add
        mov     eax, dword ptr [rsi]
        add     dword ptr [rdi], eax
        ret

It seems like for add it's doing something like:

register = *amount;
*x += register;

But for multiply it's doing:

register = *x;
register *= *amount;
*x = register;

Why does the multiplication require an extra instruction over the add, or is it not required but just faster?

Subak answered 11/8, 2021 at 15:49 Comment(2)
fwiw, you don't need pointers to see the extra mov : godbolt.org/z/YTfTKe75oAndry
Note also that since instructions can be executed in parallel counting instruction (or cycles per instruction) is not good metric of performance. So it is possible speed of both functions could be indistinguishable. In this simple case it should be fine.Luehrmann
I
10

The IA-32 architecture specification (alternative single-page link) shows that there is simply no encoding for IMUL where the destination (first argument) is a memory operand:

Encoding               | Meaning
IMUL r/m8*             | AX ← AL ∗ r/m byte.
IMUL r/m16             | DX:AX ← AX ∗ r/m word.
IMUL r/m32             | EDX:EAX ← EAX ∗ r/m32.
IMUL r/m64             | RDX:RAX ← RAX ∗ r/m64.
IMUL r16, r/m16        | word register ← word register ∗ r/m16.
IMUL r32, r/m32        | doubleword register ← doubleword register ∗ r/m32.
IMUL r64, r/m64        | Quadword register ← Quadword register ∗ r/m64.
IMUL r16, r/m16, imm8  | word register ← r/m16 ∗ sign-extended immediate byte.
IMUL r32, r/m32, imm8  | doubleword register ← r/m32 ∗ sign- extended immediate byte.
IMUL r64, r/m64, imm8  | Quadword register ← r/m64 ∗ sign-extended immediate byte.
IMUL r16, r/m16, imm16 | word register ← r/m16 ∗ immediate word.
IMUL r32, r/m32, imm32 | doubleword register ← r/m32 ∗ immediate doubleword.
IMUL r64, r/m64, imm32 | Quadword register ← r/m64 ∗ immediate doubleword.

Indocile answered 11/8, 2021 at 15:56 Comment(1)
Historical reason: multi-operand forms of imul were new with 186 (immediate) and 386 (r, r/m). Unlike with add, one of the ALU instructions from original 8086 thus having opcodes for both the r, r/m and r/m, r forms. Unlike some other limitations / design choices, not having memory-destination multiply is not noticeable a problem for x86. In real life you'd always want to inline tiny functions like these anyway, and often at least one operand will already be in a register.Roderich

© 2022 - 2024 — McMap. All rights reserved.