sse2 - McMap

3

Clamp unsigned int to 0x10000 using SSE2

I want to clamp 32-bit unsigned ints to fixed value (0x10000) using only SSE2 instructions. Basically, this C code: if (c>0x10000) c=0x10000; This code below works, but I'm wondering if it can b...

assembly x86 simd sse2 clamp

Franzen asked 2/2, 2024 at 17:46

1

Solved

AVX divide __m256i packed 32-bit integers by two (no AVX2)

I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2. As far as I know, my options are: Drop down t...

c++simd sse avx sse2

Aforethought asked 30/4, 2022 at 22:46

1

Solved

What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most ...

intel sse intrinsics sse2 mmx

Infix asked 30/1, 2022 at 4:35

1

Solved

The right way to use function _mm_clflush to flush a large struct

I am starting to use functions like _mm_clflush, _mm_clflushopt, and _mm_clwb. Say now as I have defined a struct name mystruct and its size is 256 Bytes. My cacheline size is 64 Bytes. Now I want ...

c x86 cpu-cache sse2 clflush

Mental asked 26/2, 2021 at 8:36

3

Solved

How to simulate pcmpgtq on sse2?

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4...

assembly sse simd sse2 sse4

Smalltime asked 6/12, 2020 at 8:36

1

Solved

What is the point of SSE2 instructions such as orpd?

The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of hav...

assembly x86 sse instruction-set sse2

Weka asked 31/5, 2020 at 5:28

5

Solved

How to divide 16-bit integer by 255 with using SSE?

I deal with image processing. I need to divide 16-bit integer SSE vector by 255. I can't use shift operator like _mm_srli_epi16(), because 255 is not a multiple of power of 2. I know of course th...

c++image-processing sse simd sse2

Guanajuato asked 9/2, 2016 at 6:28

0

Left-shift (of float32 array) with AVX2 and filling up with a zero

I have been using the following "trick" in C code with SSE2 for single precision floats for a while now: static inline __m128 SSEI_m128shift(__m128 data) { return (__m128)_mm_srli_si128(_mm_castp...

c gcc intrinsics avx2 sse2

Unclassical asked 23/5, 2020 at 11:52

3

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this: s=0; for (i=0; i<32; i++) { s = s + a[i]; } However, its taking more time, since m...

x86 sse simd sse2 sse3

Ericaericaceous asked 7/6, 2012 at 13:13

4

Solved

Detect the availability of SSE/SSE2 instruction set in Visual Studio

How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler? I have tried #ifdef __SSE__ but it didn't work.

c++visual-studio x86 sse sse2

Reedreedbird asked 1/9, 2013 at 23:38

2

Solved

Why do x86 FP compares set CF like unsigned integers, instead of using signed conditions?

The following documentation is provided in the Intel Instruction Reference for the COMISD instruction: Compares the double-precision floating-point values in the low quadwords of operand 1 (fir...

assembly x86 sse sse2 x87

Susquehanna asked 24/7, 2019 at 17:25

3

Solved

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in ...

c x86 simd intrinsics sse2

Acidify asked 19/9, 2012 at 13:13

1

Solved

How to floor/int in double using only SSE2?

In float, it seems pretty easy to floor() and than int(), such as: float z = floor(LOG2EF * x + 0.5f); const int32_t n = int32_t(z); become: __m128 z = _mm_add_ps(_mm_mul_ps(log2ef, x), half);...

c++simd truncate intrinsics sse2

Drucie asked 28/1, 2019 at 16:17

1

Solved

How to convert scalar code of the double version of VDT's Pade Exp fast_ex() approx into SSE2?

Here's the code I'm trying to convert: the double version of VDT's Pade Exp fast_ex() approx (here's the old repo resource): inline double fast_exp(double initial_x){ double x = initial_x; doubl...

c++sse intrinsics sse2 exp

Tricyclic asked 25/1, 2019 at 11:44

3

SSE multiplication of 2 64-bit integers

How to multiply two 64-bit integers by another 2 64-bit integers? I didn't find any instruction which can do it.

x86 sse simd multiplication sse2

Langobardic asked 25/7, 2013 at 16:14

3

Solved

How do you process exp() with SSE2?

I'm making a code that essentially takes advantage of SSE2 on optimizing this code: double *pA = a; double *pB = b[voiceIndex]; double *pC = c[voiceIndex]; for (int sampleIndex = 0; sampleIndex &...

c++simd intrinsics sse2 exp

Lizzielizzy asked 20/12, 2018 at 16:10

1

Solved

What is __m128d?

I really can't get what "keyword" like __m128d is in C++. I'm using MSVC, and it says: The __m128d data type, for use with the Streaming SIMD Extensions 2 instructions intrinsics, is defined in &l...

c++intel intrinsics sse2

Paddle asked 13/12, 2018 at 8:19

3

Solved

SSE2 option in Visual C++ (x64)

I've added x64 configuration to my C++ project to compile 64-bit version of my app. Everything looks fine, but compiler gives the following warning: `cl : Command line warning D9002 : ignoring unk...

c++visual-studio-2008 optimization 64-bit sse2

Macaulay asked 1/7, 2009 at 6:53

2

Solved

Convert _mm_shuffle_epi32 to C expression for the permutation?

I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32 and the NEON instructions I se...

x86 x86-64 sse shuffle sse2

Pegg asked 7/5, 2016 at 4:4

1

Fastest way to perform AVX inner product operations with mixed (float, double) input vectors

I need to build a single-precision floating-point inner product routine for mixed single/double-precision floating-point vectors, exploiting the AVX instruction set for SIMD registers with 256 bits...

c++vectorization simd avx sse2

Vercelli asked 21/3, 2018 at 18:40

1

Solved

What is the difference between loadu_ps and set_ps when using unformatted data?

I have some data that isn't stored as structure of arrays. What is the best practice for loading the data in registers? __m128 _mm_set_ps (float e3, float e2, float e1, float e0) // or __m128 _mm...

sse simd intrinsics sse2

Symphonize asked 13/3, 2018 at 20:50

4

Solved

Fast counting the number of set bits in __m128i register

I should count the number of set bits of a __m128i register. In particular, I should write two functions that are able to count the number of bits of the register, using the following ways. The t...

c sse simd sse2 hammingweight

Illumination asked 27/6, 2013 at 23:37

1

gcc -mno-sse2 rounding

I'm doing a project where I do RGB to luma conversions, and I have some rounding issues with the -mno-sse2 flag: Here's the test code: #include <stdio.h> #include <stdint.h> static d...

c gcc compilation rounding sse2

Argo asked 28/1, 2016 at 18:26

2

Solved

Shifiting xmm integer register values using non-AVX instructions on Intel x86 architecture

I have the following problem which I need to solve using anything other than AVX2. I have 3 values stored in a m128i variable (the 4th value is not needed ) and need to shift those values by 4,3,5...

c++x86 simd intrinsics sse2

Epistemology asked 28/10, 2017 at 20:7

5

Solved

How to test if your Linux Support SSE2

Actually I have 2 questions: Is SSE2 Compatibility a CPU issue or Compiler issue? How to check if your CPU or Compiler support SSE2? I am using GCC Version: gcc (GCC) 4.5.1 When I tried to c...

linux unix compiler-construction sse2 itanium

Collincolline asked 17/11, 2010 at 9:54

sse2 Questions

Recommended topics

Hot tags