sse2 Questions
3
1
Solved
I'm looking for the fastest way to divide an __m256i of packed 32-bit integers by two (aka shift right by one) using AVX. I don't have access to AVX2.
As far as I know, my options are:
Drop down t...
1
Solved
The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most ...
Infix asked 30/1, 2022 at 4:35
1
Solved
I am starting to use functions like _mm_clflush, _mm_clflushopt, and _mm_clwb.
Say now as I have defined a struct name mystruct and its size is 256 Bytes. My cacheline size is 64 Bytes. Now I want ...
3
Solved
1
Solved
The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of hav...
Weka asked 31/5, 2020 at 5:28
5
Solved
I deal with image processing.
I need to divide 16-bit integer SSE vector by 255.
I can't use shift operator like _mm_srli_epi16(), because 255 is not a multiple of power of 2.
I know of course th...
Guanajuato asked 9/2, 2016 at 6:28
0
I have been using the following "trick" in C code with SSE2 for single precision floats for a while now:
static inline __m128 SSEI_m128shift(__m128 data)
{
return (__m128)_mm_srli_si128(_mm_castp...
Unclassical asked 23/5, 2020 at 11:52
3
I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this:
s=0;
for (i=0; i<32; i++)
{
s = s + a[i];
}
However, its taking more time, since m...
4
Solved
How can I check in code whether SSE/SSE2 is enabled or not by the Visual Studio compiler?
I have tried #ifdef __SSE__ but it didn't work.
Reedreedbird asked 1/9, 2013 at 23:38
2
Solved
The following documentation is provided in the Intel Instruction Reference for the COMISD instruction:
Compares the double-precision floating-point values in the low
quadwords of operand 1 (fir...
3
Solved
The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in ...
Acidify asked 19/9, 2012 at 13:13
1
Solved
In float, it seems pretty easy to floor() and than int(), such as:
float z = floor(LOG2EF * x + 0.5f);
const int32_t n = int32_t(z);
become:
__m128 z = _mm_add_ps(_mm_mul_ps(log2ef, x), half);...
Drucie asked 28/1, 2019 at 16:17
1
Solved
Here's the code I'm trying to convert: the double version of VDT's Pade Exp fast_ex() approx (here's the old repo resource):
inline double fast_exp(double initial_x){
double x = initial_x;
doubl...
Tricyclic asked 25/1, 2019 at 11:44
3
How to multiply two 64-bit integers by another 2 64-bit integers?
I didn't find any instruction which can do it.
Langobardic asked 25/7, 2013 at 16:14
3
Solved
I'm making a code that essentially takes advantage of SSE2 on optimizing this code:
double *pA = a;
double *pB = b[voiceIndex];
double *pC = c[voiceIndex];
for (int sampleIndex = 0; sampleIndex &...
Lizzielizzy asked 20/12, 2018 at 16:10
1
Solved
I really can't get what "keyword" like __m128d is in C++.
I'm using MSVC, and it says: The __m128d data type, for use with the Streaming SIMD Extensions 2 instructions intrinsics, is defined in &l...
Paddle asked 13/12, 2018 at 8:19
3
Solved
I've added x64 configuration to my C++ project to compile 64-bit version of my app. Everything looks fine, but compiler gives the following warning:
`cl : Command line warning D9002 : ignoring unk...
Macaulay asked 1/7, 2009 at 6:53
2
Solved
I'm working on a port of SSE2 to NEON. The port is early stage and it's producing incorrect results. Part of the reason for the incorrect results is _mm_shuffle_epi32 and the NEON instructions I se...
1
I need to build a single-precision floating-point inner product routine for mixed single/double-precision floating-point vectors, exploiting the AVX instruction set for SIMD registers with 256 bits...
Vercelli asked 21/3, 2018 at 18:40
1
Solved
I have some data that isn't stored as structure of arrays. What is the best practice for loading the data in registers?
__m128 _mm_set_ps (float e3, float e2, float e1, float e0)
// or
__m128 _mm...
Symphonize asked 13/3, 2018 at 20:50
4
Solved
I should count the number of set bits of a __m128i register.
In particular, I should write two functions that are able to count the number of bits of the register, using the following ways.
The t...
Illumination asked 27/6, 2013 at 23:37
1
I'm doing a project where I do RGB to luma conversions, and I have some rounding issues with the -mno-sse2 flag:
Here's the test code:
#include <stdio.h>
#include <stdint.h>
static d...
Argo asked 28/1, 2016 at 18:26
2
Solved
I have the following problem which I need to solve using anything other than AVX2.
I have 3 values stored in a m128i variable (the 4th value is not needed ) and need to shift those values by 4,3,5...
Epistemology asked 28/10, 2017 at 20:7
5
Solved
Actually I have 2 questions:
Is SSE2 Compatibility a CPU issue or Compiler issue?
How to check if your CPU or Compiler support SSE2?
I am using GCC Version:
gcc (GCC) 4.5.1
When I tried to c...
Collincolline asked 17/11, 2010 at 9:54
1 Next >
© 2022 - 2025 — McMap. All rights reserved.