If you got a linker error, you're probably ignoring a warning about an undeclared intrinsic function.
Your current code has a high risk of compiling to terrible asm. If it compiled to a vector-shift and an OR, it already is compiling to sub-optimal code. (Update: that's not what it compiles to, IDK where you got that idea.)
Use 2x _mm_cvtpd_epi32 to get two __m128i
vectors with ints you want in the low 2 elements of each. Use _mm_unpacklo_epi64 to combine those two low halves into one vector with all 4 elements you want.
Compiler output from clang3.8.1 on the Godbolt compiler explorer. (Xcode uses clang by default, I think).
#include <immintrin.h>
// the good version
__m128i pack_double_to_int(__m128d a, __m128d b) {
return _mm_unpacklo_epi64(_mm_cvtpd_epi32(a), _mm_cvtpd_epi32(b));
}
cvtpd2dq xmm0, xmm0
cvtpd2dq xmm1, xmm1
punpcklqdq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0]
ret
// the original
__m128i pack_double_to_int_badMMX(__m128d a, __m128d b) {
return _mm_set_epi64(_mm_cvtpd_pi32(b), _mm_cvtpd_pi32(a));
}
cvtpd2pi mm0, xmm1
cvtpd2pi mm1, xmm0
movq2dq xmm1, mm0
movq2dq xmm0, mm1
punpcklqdq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0]
# note the lack of EMMS, because of not using the intrinsic for it
ret
MMX is almost totally useless when SSE2 and later is available; just avoid it. See the sse tag wiki for some guides.