sse2 Questions

2

Solved

I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access ...
Melodic asked 18/1, 2017 at 22:48

1

Solved

What is best way to load and store generate purpose registers to/from SIMD registers? So far I have been using the stack as a temporary. For example, mov [rsp + 0x00], r8 mov [rsp + 0x08], r9 mov ...
Left asked 16/11, 2016 at 3:52

4

Solved

I was reading today about researchers discovering that NVidia's Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed trumps precision. However, ...
Supranational asked 8/7, 2010 at 16:57

8

Solved

I've tried to compile this program on an x64 computer: #include <cstring> int main(int argc, char* argv[]) { return ::std::strcmp(argv[0], "really really really really really really reall...
Kimberelykimberlee asked 27/10, 2014 at 10:59

2

Solved

I need to load and rearrange 12 bytes into 16 (or 24 into 32) following the pattern below: ABC DEF GHI JKL becomes ABBC DEEF GHHI JKKL Can you suggest efficient ways to achieve this using the...
Umbilical asked 31/3, 2016 at 7:40

4

Solved

I wrote the function int compare_16bytes(__m128i lhs, __m128i rhs) in order to compare two 16 byte numbers using SSE instructions: this function returns how many bytes are equal after perform...
Kattegat asked 9/3, 2013 at 17:20

1

Solved

Trying to upgrade Node.js from 0.10.x to 0.12.0. The first thing noticed is that I am getting an error that SSE2 instructions are not supported by my CPU (indeed they are not). Tried to compile No...
Fourteen asked 15/3, 2015 at 22:5

3

Solved

I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics. Much to my disappointment, I discover that the shift instructions _mm256_slli_si256 and _mm256_srli_si256 operate o...
Capua asked 11/8, 2014 at 17:14

4

Solved

I have many function which use the same constant __m128i values. For example: const __m128i K8 = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); const __m128i K16 = _mm_setr_...
Labour asked 8/2, 2016 at 11:1

2

I need to shift a __m128i variable, (say v), by m bits, in such a way that bits move through all of the variable (So, the resulting variable represents v*2^m). What is the best way to do this?! No...
Exuberance asked 27/12, 2015 at 7:1

1

Solved

I have a strange issue with some SSE2 and AVX code I have been working on. I am building my application using GCC which runtime cpu feature detection. The object files are built with seperate...
Adelaideadelaja asked 15/10, 2015 at 13:16

2

Solved

I want to calculate y = ax + b, where x and y is a pixel value [i.e, byte with value range is 0~255], while a and b is a float Since I need to apply this formula for each pixel in image, in additi...
Rid asked 29/8, 2015 at 8:26

4

I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen(), memset(), etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to ge...
Delitescent asked 7/8, 2013 at 21:18

3

Solved

Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do th...
Proudhon asked 10/5, 2010 at 17:32

1

Solved

I want to convert a vector of double precision values to char. I have to make two distinct approaches, one for SSE2 and the other for AVX2. I started with AVX2. __m128i sub_proc(__m256d& in) ...
Sherwood asked 15/6, 2015 at 19:48

2

Solved

I'm just playing around with gcc (g++) and the compilerflags -msse and -msse2. I have a little test program which looks like that: #include <iostream> int main(int argc, char **argv) { flo...
Pileous asked 26/4, 2015 at 7:50

2

Solved

Intel provides several SIMD commands, which seems all performing bitwise XOR on 128-bit data: _mm_xor_pd(__m128d, __m128d) _mm_xor_ps(__m128, __m128) _mm_xor_si128(__m128i, __m128i) Isn't ...
Onrush asked 18/3, 2015 at 13:4

2

Solved

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards...
Degraded asked 22/10, 2014 at 21:2

3

I am getting trouble with this error: "SSE instruction set not enabled". How I can figure this out? I have ACER i7, Ubuntu 11.10, please any one can help me? Any help will be appreciated! Also...
Constitute asked 4/2, 2012 at 21:6

2

Solved

I'm trying to implement some inline assembler (in Visual Studio 2012 C++ code) to take advantage of SSE. I want to add 7 numbers for 1e9 times so i placed them from RAM to xmm0 to xmm6 registers of...
Waxy asked 11/3, 2013 at 21:46

1

Solved

I'm looking to understand SSE2's capabilities a little more, and would like to know if one could make a 128-bit wide integer that supports addition, subtraction, XOR and multiplication?
Demicanton asked 30/8, 2012 at 15:45

1

Solved

I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing th...
Bats asked 12/7, 2013 at 8:29

6

Solved

I have the following bottleneck function. typedef unsigned char byte; void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3) { const byte b1 = 128-30; const by...
Chaldean asked 21/10, 2010 at 11:40

2

Solved

A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2. So far I have: float* a =...
Contraposition asked 8/6, 2013 at 14:25

2

Solved

Is there any SSE2 instruction to load a 128 bit int vector register from an int buffer, in reverse order ?
Mingy asked 16/5, 2013 at 10:4

© 2022 - 2024 — McMap. All rights reserved.