sse2 - 2 - McMap

2

Solved

I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access ...

c++virtual-machine vectorization sse2

Melodic asked 18/1, 2017 at 22:48

1

Solved

Best way to load/store from/to general purpose registers to/from xmm/ymm register

What is best way to load and store generate purpose registers to/from SIMD registers? So far I have been using the stack as a temporary. For example, mov [rsp + 0x00], r8 mov [rsp + 0x08], r9 mov ...

assembly x86 simd sse2 avx2

Left asked 16/11, 2016 at 3:52

4

Solved

Extended (80-bit) double floating point in x87, not SSE2 - we don't miss it?

I was reading today about researchers discovering that NVidia's Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed trumps precision. However, ...

floating-point sse2 x87

Supranational asked 8/7, 2010 at 16:57

8

Solved

Why is strcmp not SIMD optimized?

I've tried to compile this program on an x64 computer: #include <cstring> int main(int argc, char* argv[]) { return ::std::strcmp(argv[0], "really really really really really really reall...

c++sse simd strcmp sse2

Kimberelykimberlee asked 27/10, 2014 at 10:59

2

Solved

Complex data reorganization with vector instructions

I need to load and rearrange 12 bytes into 16 (or 24 into 32) following the pattern below: ABC DEF GHI JKL becomes ABBC DEEF GHHI JKKL Can you suggest efficient ways to achieve this using the...

x86 vectorization simd sse2 avx2

Umbilical asked 31/3, 2016 at 7:40

4

Solved

Fast counting the number of equal bytes between two arrays [duplicate]

I wrote the function int compare_16bytes(__m128i lhs, __m128i rhs) in order to compare two 16 byte numbers using SSE instructions: this function returns how many bytes are equal after perform...

c++c sse simd sse2

Kattegat asked 9/3, 2013 at 17:20

1

Solved

Why does V8 in Node.js 0.12.0 release require SSE2 CPU instructions?

Trying to upgrade Node.js from 0.10.x to 0.12.0. The first thing noticed is that I am getting an error that SSE2 instructions are not supported by my CPU (indeed they are not). Tried to compile No...

node.js v8 sse2

Fourteen asked 15/3, 2015 at 22:5

3

Solved

Emulating shifts on 32 bytes with AVX

I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics. Much to my disappointment, I discover that the shift instructions _mm256_slli_si256 and _mm256_srli_si256 operate o...

c++simd intrinsics sse2 avx2

Capua asked 11/8, 2014 at 17:14

4

Solved

How can I set __m128i without using of any SSE instruction?

I have many function which use the same constant __m128i values. For example: const __m128i K8 = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16); const __m128i K16 = _mm_setr_...

c++constants sse simd sse2

Labour asked 8/2, 2016 at 11:1

2

The best way to shift a __m128i?

I need to shift a __m128i variable, (say v), by m bits, in such a way that bits move through all of the variable (So, the resulting variable represents v*2^m). What is the best way to do this?! No...

c bitwise-operators sse bit-shift sse2

Exuberance asked 27/12, 2015 at 7:1

1

Solved

SSE runs slow after using AVX [duplicate]

I have a strange issue with some SSE2 and AVX code I have been working on. I am building my application using GCC which runtime cpu feature detection. The object files are built with seperate...

c++gcc x86 avx sse2

Adelaideadelaja asked 15/10, 2015 at 13:16

2

Solved

Scaling byte pixel values (y=ax+b) with SSE2 (as floats)?

I want to calculate y = ax + b, where x and y is a pixel value [i.e, byte with value range is 0~255], while a and b is a float Since I need to apply this formula for each pixel in image, in additi...

c++visual-studio x86 simd sse2

Rid asked 29/8, 2015 at 8:26

4

Performance optimisations of x86-64 assembly - Alignment and branch prediction

I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen(), memset(), etc, using x86-64 assembly with SSE-2 instructions. So far I’ve managed to ge...

performance assembly x86-64 sse2 branch-prediction

Delitescent asked 7/8, 2013 at 21:18

3

Solved

What's the difference between logical SSE intrinsics?

Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do th...

c sse simd intrinsics sse2

Proudhon asked 10/5, 2010 at 17:32

1

Solved

SSE - AVX conversion from double to char

I want to convert a vector of double precision values to char. I have to make two distinct approaches, one for SSE2 and the other for AVX2. I started with AVX2. __m128i sub_proc(__m256d& in) ...

c++simd avx sse2 avx2

Sherwood asked 15/6, 2015 at 19:48

2

Solved

C/C++: -msse and -msse2 Flags do not have any effect on the binaries?

I'm just playing around with gcc (g++) and the compilerflags -msse and -msse2. I have a little test program which looks like that: #include <iostream> int main(int argc, char **argv) { flo...

c++gcc sse sse2

Pileous asked 26/4, 2015 at 7:50

2

Solved

What is the difference between these 128bit SIMD xor operations [duplicate]

Intel provides several SIMD commands, which seems all performing bitwise XOR on 128-bit data: _mm_xor_pd(__m128d, __m128d) _mm_xor_ps(__m128, __m128) _mm_xor_si128(__m128i, __m128i) Isn't ...

simd sse intrinsics sse2

Onrush asked 18/3, 2015 at 13:4

2

Solved

Is SSE2 signed integer overflow undefined?

Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards...

c language-lawyer undefined-behavior sse2

Degraded asked 22/10, 2014 at 21:2

3

SSE instruction set not enabled

I am getting trouble with this error: "SSE instruction set not enabled". How I can figure this out? I have ACER i7, Ubuntu 11.10, please any one can help me? Any help will be appreciated! Also...

c++intrinsics sse2 sse3

Constitute asked 4/2, 2012 at 21:6

2

Solved

Using XMM0 register and memory fetches (C++ code) is twice as fast as ASM only using XMM registers - Why?

I'm trying to implement some inline assembler (in Visual Studio 2012 C++ code) to take advantage of SSE. I want to add 7 numbers for 1e9 times so i placed them from RAM to xmm0 to xmm6 registers of...

c++performance optimization assembly sse2

Waxy asked 11/3, 2013 at 21:46

1

Solved

Is it possible to use SSE and SSE2 to make a 128-bit wide integer?

I'm looking to understand SSE2's capabilities a little more, and would like to know if one could make a 128-bit wide integer that supports addition, subtraction, XOR and multiplication?

assembly sse sse2

Demicanton asked 30/8, 2012 at 15:45

1

Solved

Shift a __m128i of n bits

I have a __m128i variable and I need to shift its 128 bit value of n bits, i.e. like _mm_srli_si128 and _mm_slli_si128 work, but on bits instead of bytes. What is the most efficient way of doing th...

c x86 sse simd sse2

Bats asked 12/7, 2013 at 8:29

6

Solved

How to optimize a cycle?

I have the following bottleneck function. typedef unsigned char byte; void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3) { const byte b1 = 128-30; const by...

c++optimization assembly intrinsics sse2

Chaldean asked 21/10, 2010 at 11:40

2

Solved

How to vectorize a distance calculation using SSE2

A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2. So far I have: float* a =...

c++visual-c++optimization vectorization sse2

Contraposition asked 8/6, 2013 at 14:25

2

Solved

SSE2 instruction to load integers in reverse order

Is there any SSE2 instruction to load a 128 bit int vector register from an int buffer, in reverse order ?

x86 sse simd sse2

Mingy asked 16/5, 2013 at 10:4

sse2 Questions

Recommended topics

Hot tags