sse2 Questions

3

I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions. Basically, this C code: if (c>0x10000) c=0x10000; This code below works, but I'm wondering if it can b...
Franzen asked 2/2, 2024 at 17:46

1

Solved

I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2. As far as I know, my options are: Drop down t...
Aforethought asked 30/4, 2022 at 22:46

1

Solved

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most ...
Infix asked 30/1, 2022 at 4:35

1

Solved

I am starting to use functions like _mm_clflush, _mm_clflushopt, and _mm_clwb. Say now as I have defined a struct name mystruct and its size is 256 Bytes. My cacheline size is 64 Bytes. Now I want ...
Mental asked 26/2, 2021 at 8:36

3

Solved

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4...
Smalltime asked 6/12, 2020 at 8:36

1

Solved

The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of hav...
Weka asked 31/5, 2020 at 5:28

5

Solved

I deal with image processing. I need to divide 16-bit integer SSE vector by 255. I can't use shift operator like _mm_srli_epi16(), because 255 is not a multiple of power of 2. I know of course th...
Guanajuato asked 9/2, 2016 at 6:28

0

I have been using the following "trick" in C code with SSE2 for single precision floats for a while now: static inline __m128 SSEI_m128shift(__m128 data) { return (__m128)_mm_srli_si128(_mm_castp...
Unclassical asked 23/5, 2020 at 11:52

3

I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this: s=0; for (i=0; i<32; i++) { s = s + a[i]; } However, its taking more time, since m...
Ericaericaceous asked 7/6, 2012 at 13:13

4

Solved

How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler? I have tried #ifdef __SSE__ but it didn't work.
Reedreedbird asked 1/9, 2013 at 23:38

2

Solved

The following documentation is provided in the Intel Instruction Reference for the COMISD instruction: Compares the double-precision floating-point values in the low quadwords of operand 1 (fir...
Susquehanna asked 24/7, 2019 at 17:25

3

Solved

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in ...
Acidify asked 19/9, 2012 at 13:13

1

Solved

In float, it seems pretty easy to floor() and than int(), such as: float z = floor(LOG2EF * x + 0.5f); const int32_t n = int32_t(z); become: __m128 z = _mm_add_ps(_mm_mul_ps(log2ef, x), half);...
Drucie asked 28/1, 2019 at 16:17

1

Solved

Here's the code I'm trying to convert: the double version of VDT's Pade Exp fast_ex() approx (here's the old repo resource): inline double fast_exp(double initial_x){ double x = initial_x; doubl...
Tricyclic asked 25/1, 2019 at 11:44

3

How to multiply two 64-bit integers by another 2 64-bit integers? I didn't find any instruction which can do it.
Langobardic asked 25/7, 2013 at 16:14

3

Solved

I'm making a code that essentially takes advantage of SSE2 on optimizing this code: double *pA = a; double *pB = b[voiceIndex]; double *pC = c[voiceIndex]; for (int sampleIndex = 0; sampleIndex &...
Lizzielizzy asked 20/12, 2018 at 16:10

1

Solved

I really can't get what "keyword" like __m128d is in C++. I'm using MSVC, and it says: The __m128d data type, for use with the Streaming SIMD Extensions 2 instructions intrinsics, is defined in &l...
Paddle asked 13/12, 2018 at 8:19

3

Solved

I've added x64 configuration to my C++ project to compile 64-bit version of my app. Everything looks fine, but compiler gives the following warning: `cl : Command line warning D9002 : ignoring unk...
Macaulay asked 1/7, 2009 at 6:53

2

Solved

I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32 and the NEON instructions I se...
Pegg asked 7/5, 2016 at 4:4

1

I need to build a single-precision floating-point inner product routine for mixed single/double-precision floating-point vectors, exploiting the AVX instruction set for SIMD registers with 256 bits...
Vercelli asked 21/3, 2018 at 18:40

1

Solved

I have some data that isn't stored as structure of arrays. What is the best practice for loading the data in registers? __m128 _mm_set_ps (float e3, float e2, float e1, float e0) // or __m128 _mm...
Symphonize asked 13/3, 2018 at 20:50

4

Solved

I should count the number of set bits of a __m128i register. In particular, I should write two functions that are able to count the number of bits of the register, using the following ways. The t...
Illumination asked 27/6, 2013 at 23:37

1

I'm doing a project where I do RGB to luma conversions, and I have some rounding issues with the -mno-sse2 flag: Here's the test code: #include <stdio.h> #include <stdint.h> static d...
Argo asked 28/1, 2016 at 18:26

2

Solved

I have the following problem which I need to solve using anything other than AVX2. I have 3 values stored in a m128i variable (the 4th value is not needed ) and need to shift those values by 4,3,5...
Epistemology asked 28/10, 2017 at 20:7

5

Solved

Actually I have 2 questions: Is SSE2 Compatibility a CPU issue or Compiler issue? How to check if your CPU or Compiler support SSE2? I am using GCC Version: gcc (GCC) 4.5.1 When I tried to c...
Collincolline asked 17/11, 2010 at 9:54

© 2022 - 2025 — McMap. All rights reserved.