You don't need to use legacy x87 stack registers in x86-64 code, because SSE2 is baseline, a required part of the x86-64 ISA. You can and should do your scalar FP math using addsd
, mulsd
, sqrtsd
and so on, on XMM registers. (Or addss
for float)
The Windows x64 calling convention passes float/double FP args in XMM0..3, if they're one of the first four args to the function. (i.e. the 3rd total arg goes in xmm2 if it's FP, rather than the 3rd FP arg going in xmm2.) It returns FP values in XMM0.
Only use x87 if you actually need 80-bit precision inside your function. (Instructions like fsin
and fyl2x
are not fast, and can usually be done just as well by normal math libraries using SSE/SSE2 instructions.
function times2(X:Double):Double;
asm
addsd xmm0, xmm0 // upper 8 bytes of XMM0 are ignored
ret
end
Storing to memory and reloading into an x87 register costs you about 10 cycles of latency for no benefit. SSE/SSE2 scalar instructions are just as fast, or faster, than their x87 equivalents, and easier to program for and optimize because you never need fxch
; it's a flat register design instead of stack-based. (https://agner.org/optimize/). Also, you have 15 XMM registers.
Of course, you usually don't need inline asm at all. It could be useful for manually-vectorizing if the compiler doesn't do that for you.
Extended
type in Win64. And that shows that Delphi Win64 does not use FPU (x86). It uses SSE instead. Thus using FPU instructions is problematic. Also be careful when using BAsm x64 - there are bugs that destroy data or even inverse program control flow. – Rockribbed