sse4 Questions
4
I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images.
In the paper about the BRIEF-descriptor is written that it is possible to speed up things:
"The BRIEF...
Giuditta asked 17/4, 2012 at 9:5
2
Solved
I am trying to enable different simd support using MSVC.
There is a page talking about enabling some simd, such as SSE2, AVX, AVX2
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?red...
Landslide asked 24/9, 2020 at 19:59
3
Solved
1
Solved
So, one of the porpuses of docker is to easily deploy an environment to test software right? Can anybody tell me how to compile a Tensorflow binary to use: SSE4.1, SSE4.2 on a docker file?. Can any...
Oruntha asked 29/1, 2018 at 15:44
3
Solved
3
Solved
I tried to run the following program in my computer (Fedora 17 32bit). How can I enable my system to support the popcnt instruction for fast population count?
#include <stdio.h>
#include <...
Finny asked 11/11, 2012 at 15:5
1
Here is my code's assembler
Can you embed it in c ++ and check against SSE4? At speed
I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? L...
Bearer asked 16/10, 2017 at 4:7
1
Solved
I'm experimenting with SSE42 and STTNI instructions and have got strange result - PcmpEstrM (works with explicit length strings) runs twice slower than PcmpIstrM (implicit length strings).
On my...
Lyndel asked 5/1, 2014 at 16:7
2
Solved
I have written a library, where I use CMake for verifying the presence of headers for MMX, SSE, SSE2, SSE4, AVX, AVX2, and AVX-512. In addition to this, I check for the presence of the instructions...
1
Solved
What can you do with SSE4.1 ptest other than testing if a single register is all-zero?
Can you use a combination of SF and CF to test anything useful about two unknown input registers?
What is PT...
Eck asked 30/4, 2017 at 23:3
1
The Intel Xeon Phi "Knights Landing" processor will be the first to support AVX-512, but it will only support "F" (like SSE without SSE2, or AVX without AVX2), so floating-point stuff mainly.
I'm...
1
Solved
3
2
I have a simple test program that loads an xmm register with the
movdqu instruction accessing data across a page boundary (OS = Linux).
If the following page is mapped, this works just fine. If it...
3
Solved
I want to multiply with SSE4 a __m128i object with 16 unsigned 8 bit integers, but I could only find an intrinsic for multiplying 16 bit integers. Is there nothing such as _mm_mult_epi8?
1
Solved
Assuming I have SSE to SSE4.1, but not AVX(2), what is the fastest way to load a packed memory layout like this (all 32-bit integers):
a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3
Into four v...
3
Solved
I need to quickly compare two string on the machine with SSE4 support. How can I do it without writing assembler inserts?
Some wrappers like long long bitmask = strcmp(char* a, char* b) would be p...
2
Solved
1
Solved
Is it possible to compare more than a pair of numbers in one instruction using SSE4?
Intel Reference says the following about PCMPGTQ
PCMPGTQ — Compare Packed Data for Greater Than
Performs ...
Lowman asked 24/9, 2012 at 4:7
1
Solved
MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2:
__popcnt()
_mm_popcnt_u32()
The only difference I found was that the docs for __popcnt() are marked as "Microsoft...
Jasperjaspers asked 20/6, 2012 at 6:32
1
Solved
I'm implementing a fast x888 -> 565 pixel conversion function in pixman according to the algorithm described by Intel [pdf]. Their code converts x888 -> 555 while I want to convert to 565. Un...
Traipse asked 13/6, 2012 at 23:14
2
Solved
This is my very first time working with SSE intrinsics. I am trying to convert a simple piece of code into a faster version using Intel SSE intrinsic (up to SSE4.2). I seem to encounter a number of...
1
© 2022 - 2024 — McMap. All rights reserved.