sse4 Questions

4

I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images. In the paper about the BRIEF-descriptor is written that it is possible to speed up things: "The BRIEF...
Giuditta asked 17/4, 2012 at 9:5

2

Solved

I am trying to enable different simd support using MSVC. There is a page talking about enabling some simd, such as SSE2, AVX, AVX2 https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?red...
Landslide asked 24/9, 2020 at 19:59

3

Solved

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4...
Smalltime asked 6/12, 2020 at 8:36

1

Solved

So, one of the porpuses of docker is to easily deploy an environment to test software right? Can anybody tell me how to compile a Tensorflow binary to use: SSE4.1, SSE4.2 on a docker file?. Can any...
Oruntha asked 29/1, 2018 at 15:44

3

Solved

I think, I heard about that, but don't know where. upd: I told about JiT
Lully asked 27/5, 2012 at 14:55

3

Solved

I tried to run the following program in my computer (Fedora 17 32bit). How can I enable my system to support the popcnt instruction for fast population count? #include <stdio.h> #include &lt...
Finny asked 11/11, 2012 at 15:5

1

Here is my code's assembler Can you embed it in c ++ and check against SSE4? At speed I would very much like to see how stepped into the development of SSE4. Or is not worried about him at all? L...
Bearer asked 16/10, 2017 at 4:7

1

Solved

I'm experimenting with SSE42 and STTNI instructions and have got strange result - PcmpEstrM (works with explicit length strings) runs twice slower than PcmpIstrM (implicit length strings). On my...
Lyndel asked 5/1, 2014 at 16:7

2

Solved

I have written a library, where I use CMake for verifying the presence of headers for MMX, SSE, SSE2, SSE4, AVX, AVX2, and AVX-512. In addition to this, I check for the presence of the instructions...
Multicolor asked 10/6, 2017 at 23:35

1

Solved

What can you do with SSE4.1 ptest other than testing if a single register is all-zero? Can you use a combination of SF and CF to test anything useful about two unknown input registers? What is PT...
Eck asked 30/4, 2017 at 23:3

1

The Intel Xeon Phi "Knights Landing" processor will be the first to support AVX-512, but it will only support "F" (like SSE without SSE2, or AVX without AVX2), so floating-point stuff mainly. I'm...
Merrillmerrily asked 8/6, 2016 at 21:56

1

Solved

As you know, the first two are AVX-specific intrinsics and the second is a SSE4.1 intrinsic. Both sets of intrinsics can be used to check for equality of 2 floating-point vectors. My specific use c...
Persia asked 4/3, 2016 at 7:50

3

I'm trying to find the most way of performing 8 bit unsigned compares using SSE (up to SSE 4.2). The most common case I'm working on is comparing for > 0U, e.g. _mm_cmpgt_epu8(v, _mm_setzero_si1...
Ponton asked 20/11, 2015 at 10:26

2

I have a simple test program that loads an xmm register with the movdqu instruction accessing data across a page boundary (OS = Linux). If the following page is mapped, this works just fine. If it...
Carmichael asked 11/2, 2014 at 22:49

3

Solved

I want to multiply with SSE4 a __m128i object with 16 unsigned 8 bit integers, but I could only find an intrinsic for multiplying 16 bit integers. Is there nothing such as _mm_mult_epi8?
Searching asked 19/11, 2011 at 11:3

1

Solved

Assuming I have SSE to SSE4.1, but not AVX(2), what is the fastest way to load a packed memory layout like this (all 32-bit integers): a0 b0 c0 d0 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 Into four v...
Forcier asked 24/10, 2013 at 5:34

3

Solved

I need to quickly compare two string on the machine with SSE4 support. How can I do it without writing assembler inserts? Some wrappers like long long bitmask = strcmp(char* a, char* b) would be p...
Transpacific asked 13/5, 2012 at 20:19

2

Solved

Why in the world was _mm_crc32_u64(...) defined like this? unsigned int64 _mm_crc32_u64( unsigned __int64 crc, unsigned __int64 v ); The "crc32" instruction always accumulates a 32-bit CRC, neve...
Daynadays asked 1/4, 2013 at 22:7

1

Solved

Is it possible to compare more than a pair of numbers in one instruction using SSE4? Intel Reference says the following about PCMPGTQ PCMPGTQ — Compare Packed Data for Greater Than Performs ...
Lowman asked 24/9, 2012 at 4:7

1

Solved

MS Visual C++ supports 2 flavors of the popcnt instruction on CPUs with SSE4.2: __popcnt() _mm_popcnt_u32() The only difference I found was that the docs for __popcnt() are marked as "Microsoft...
Jasperjaspers asked 20/6, 2012 at 6:32

1

Solved

I'm implementing a fast x888 -> 565 pixel conversion function in pixman according to the algorithm described by Intel [pdf]. Their code converts x888 -> 555 while I want to convert to 565. Un...
Traipse asked 13/6, 2012 at 23:14

2

Solved

This is my very first time working with SSE intrinsics. I am trying to convert a simple piece of code into a faster version using Intel SSE intrinsic (up to SSE4.2). I seem to encounter a number of...
Eleanoraeleanore asked 8/6, 2012 at 16:50
1

© 2022 - 2024 — McMap. All rights reserved.