Best way to load a 64-bit integer to a double precision SSE2 register?
Asked Answered
S

1

14

What is the best/fastest way to load a 64-bit integer value in an xmm SSE2 register in 32-bit mode?

In 64-bit mode, cvtsi2sd can be used, but in 32-bit mode, it supports only 32-bit integers.

So far I haven't found much beyond:

  • use fild, fstp to stack then movsd to xmm register
  • load the high 32-bit portion, multiply by 2^32, add the low 32-bit

First solution is slow, second solution might introduce precision loss (edit: and it is slow anyway, since the low 32 bit have to be converted as unsigned...)

Any better approach?

Stacy answered 22/3, 2013 at 11:16 Comment(9)
Multiplying the top 32 bits by 2**32 in floating-point isn't going to truncate/round them. It's only when you add the low 32 bits to them the sum gets rounded/truncated and that's what you'll get with the first method anyway. Unless I'm missing something, these two methods are equivalent (except for performance).Erek
FWIW gcc seems to use the first approach (fild, fst, movsd).Compliance
2nd option is slow actually, I mistakenly used cvtsi2sd for the low 32-bit, but that was incorrect, it needs to be converted as unsigned, for which no CPU instruction exists, so it is slow...Stacy
There is a trick with internal representation of IEEE doubles and magic constants, for example: software.intel.com/en-us/forums/topic/301988, but don't know about speedAsbestosis
And better explanation (for unsigned) here: #13734691Asbestosis
There is indeed a very fast method that I use. It's related to MBo's suggestion. But it's very hacky and only works for a range of numbers.Martainn
@Martainn Yes, I know them, but in case of overflow, I want to preserve the high order bits (as an fild does), and not the first 52 low order bits.Stacy
Ah ic. Then you're probably out of luck. At least I'm not aware of anything else.Martainn
Yet another reason why 32-bit is obsolete. BTW, for vector integer<->double, AVX512 will finally introduce packed 64-bit integer <-> double conversions. Until then, even in 64-bit mode, there's just been CVTDQ2PD xmm1, xmm2/m64 which converts a pair of 32-bit integers.Spotlight
R
9

Your second option can be made to work, though it's a little unwieldy. I'll assume that your 64-bit number is initially in edx:eax.

cvtsi2sd xmm0, edx              // high part * 2**-32
mulsd    xmm0, [2**32 from mem] // high part
movsd    xmm2, [2**52 from mem]
movd     xmm1, eax
orpd     xmm1, xmm2             // (double)(2*52 + low part as unsigned)
subsd    xmm1, xmm2             // (double)(low part as unsigned)
addsd    xmm0, xmm1             // (double)(high part + low part as unsigned)

All of the operations except for possibly the final one are exact, so this is correctly rounded. It should be noted that this conversion produces -0.0 when the input is 0 and the mxcsr is set to round-to-minus-infinity. This would need to be addressed if it were being used in a runtime library for a compiler aiming to provide IEEE-754 conformance, but is not an issue for most usage.

Recriminate answered 24/3, 2013 at 13:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.