sse2 Questions
2
Solved
I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access ...
Melodic asked 18/1, 2017 at 22:48
1
Solved
What is best way to load and store generate purpose registers to/from SIMD registers? So far I have been using the stack as a temporary. For example,
mov [rsp + 0x00], r8
mov [rsp + 0x08], r9
mov ...
4
Solved
I was reading today about researchers discovering that NVidia's Phys-X libraries use x87 FP vs. SSE2. Obviously this will be suboptimal for parallel datasets where speed trumps precision. However, ...
Supranational asked 8/7, 2010 at 16:57
8
Solved
2
Solved
I need to load and rearrange 12 bytes into 16 (or 24 into 32) following the pattern below:
ABC DEF GHI JKL
becomes
ABBC DEEF GHHI JKKL
Can you suggest efficient ways to achieve this using the...
Umbilical asked 31/3, 2016 at 7:40
4
Solved
I wrote the function int compare_16bytes(__m128i lhs, __m128i rhs) in order to compare two 16 byte numbers using SSE instructions: this function returns how many bytes are equal after perform...
1
Solved
Trying to upgrade Node.js from 0.10.x to 0.12.0. The first thing noticed is that I am getting an error that SSE2 instructions are not supported by my CPU (indeed they are not).
Tried to compile No...
3
Solved
I am migrating vectorized code written using SSE2 intrinsics to AVX2 intrinsics.
Much to my disappointment, I discover that the shift instructions _mm256_slli_si256 and _mm256_srli_si256 operate o...
Capua asked 11/8, 2014 at 17:14
4
Solved
I have many function which use the same constant __m128i values.
For example:
const __m128i K8 = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);
const __m128i K16 = _mm_setr_...
2
I need to shift a __m128i variable, (say v), by m bits, in such a way that bits move through all of the variable (So, the resulting variable represents v*2^m).
What is the best way to do this?!
No...
Exuberance asked 27/12, 2015 at 7:1
1
Solved
2
Solved
I want to calculate y = ax + b, where x and y is a pixel value [i.e, byte with value range is 0~255], while a and b is a float
Since I need to apply this formula for each pixel in image, in additi...
Rid asked 29/8, 2015 at 8:26
4
I’m currently coding highly optimised versions of some C99 standard library string functions, like strlen(), memset(), etc, using x86-64 assembly with SSE-2 instructions.
So far I’ve managed to ge...
Delitescent asked 7/8, 2013 at 21:18
3
Solved
Is there any difference between logical SSE intrinsics for different types? For example if we take OR operation, there are three intrinsics: _mm_or_ps, _mm_or_pd and _mm_or_si128 all of which do th...
Proudhon asked 10/5, 2010 at 17:32
1
Solved
2
Solved
I'm just playing around with gcc (g++) and the compilerflags -msse and -msse2. I have a little test program which looks like that:
#include <iostream>
int main(int argc, char **argv) {
flo...
2
Solved
Intel provides several SIMD commands, which seems all performing bitwise XOR on 128-bit data:
_mm_xor_pd(__m128d, __m128d)
_mm_xor_ps(__m128, __m128)
_mm_xor_si128(__m128i, __m128i)
Isn't ...
Onrush asked 18/3, 2015 at 13:4
2
Solved
Signed integer overflow is undefined in C and C++. But what about signed integer overflow within the individual fields of an __m128i? In other words, is this behavior defined in the Intel standards...
Degraded asked 22/10, 2014 at 21:2
3
I am getting trouble with this error: "SSE instruction set not enabled". How I can figure this out?
I have ACER i7, Ubuntu 11.10, please any one can help me?
Any help will be appreciated!
Also...
Constitute asked 4/2, 2012 at 21:6
2
Solved
I'm trying to implement some inline assembler (in Visual Studio 2012 C++ code) to take advantage of SSE.
I want to add 7 numbers for 1e9 times so i placed them from RAM to xmm0 to xmm6 registers of...
Waxy asked 11/3, 2013 at 21:46
1
Solved
I'm looking to understand SSE2's capabilities a little more, and would like to know if one could make a 128-bit wide integer that supports addition, subtraction, XOR and multiplication?
1
Solved
6
Solved
I have the following bottleneck function.
typedef unsigned char byte;
void CompareArrays(const byte * p1Start, const byte * p1End, const byte * p2, byte * p3)
{
const byte b1 = 128-30;
const by...
Chaldean asked 21/10, 2010 at 11:40
2
Solved
A and B are vectors or length N, where N could be in the range 20 to 200 say.
I want to calculate the square of the distance between these vectors,
i.e. d^2 = ||A-B||^2.
So far I have:
float* a =...
Contraposition asked 8/6, 2013 at 14:25
2
Solved
Is there any SSE2 instruction to load a 128 bit int vector register from an int buffer, in reverse order ?
© 2022 - 2024 — McMap. All rights reserved.