Best way to load a 64-bit integer to a double precision SSE2 register?

About

Asked 22/3, 2013 at 11:16 Answered 24/3, 2013 at 13:8

Solved assembly double sse sse2 int64

What is the best/fastest way to load a 64-bit integer value in an xmm SSE2 register in 32-bit mode?

In 64-bit mode, cvtsi2sd can be used, but in 32-bit mode, it supports only 32-bit integers.

So far I haven't found much beyond:

use fild, fstp to stack then movsd to xmm register
load the high 32-bit portion, multiply by 2^32, add the low 32-bit

First solution is slow, second solution might introduce precision loss (edit: and it is slow anyway, since the low 32 bit have to be converted as unsigned...)

Any better approach?

Stacy answered 22/3, 2013 at 11:16 Comment(9)

Multiplying the top 32 bits by 2**32 in floating-point isn't going to truncate/round them. It's only when you add the low 32 bits to them the sum gets rounded/truncated and that's what you'll get with the first method anyway. Unless I'm missing something, these two methods are equivalent (except for performance). – Erek 22/3, 2013 at 12:16

FWIW gcc seems to use the first approach (fild, fst, movsd). – Compliance 22/3, 2013 at 12:31

2nd option is slow actually, I mistakenly used cvtsi2sd for the low 32-bit, but that was incorrect, it needs to be converted as unsigned, for which no CPU instruction exists, so it is slow... – Stacy 22/3, 2013 at 12:59

There is a trick with internal representation of IEEE doubles and magic constants, for example: software.intel.com/en-us/forums/topic/301988, but don't know about speed – Asbestosis 22/3, 2013 at 13:36

And better explanation (for unsigned) here: #13734691 – Asbestosis 22/3, 2013 at 13:54

There is indeed a very fast method that I use. It's related to MBo's suggestion. But it's very hacky and only works for a range of numbers. – Martainn 22/3, 2013 at 17:19

@Martainn Yes, I know them, but in case of overflow, I want to preserve the high order bits (as an fild does), and not the first 52 low order bits. – Stacy 22/3, 2013 at 20:24

Ah ic. Then you're probably out of luck. At least I'm not aware of anything else. – Martainn 22/3, 2013 at 20:37

Yet another reason why 32-bit is obsolete. BTW, for vector integer<->double, AVX512 will finally introduce packed 64-bit integer <-> double conversions. Until then, even in 64-bit mode, there's just been CVTDQ2PD xmm1, xmm2/m64 which converts a pair of 32-bit integers. – Spotlight 18/9, 2016 at 6:37

Your second option can be made to work, though it's a little unwieldy. I'll assume that your 64-bit number is initially in edx:eax.

cvtsi2sd xmm0, edx              // high part * 2**-32
mulsd    xmm0, [2**32 from mem] // high part
movsd    xmm2, [2**52 from mem]
movd     xmm1, eax
orpd     xmm1, xmm2             // (double)(2*52 + low part as unsigned)
subsd    xmm1, xmm2             // (double)(low part as unsigned)
addsd    xmm0, xmm1             // (double)(high part + low part as unsigned)

All of the operations except for possibly the final one are exact, so this is correctly rounded. It should be noted that this conversion produces -0.0 when the input is 0 and the mxcsr is set to round-to-minus-infinity. This would need to be addressed if it were being used in a runtime library for a compiler aiming to provide IEEE-754 conformance, but is not an issue for most usage.

Recriminate answered 24/3, 2013 at 13:8 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags