intrinsics Questions

2

I have some integer value representing a bitmask, for example 154 = 0b10011010, and I want to construct a corresponding signal Vector<T> instance <0, -1, 0, -1, -1, 0, 0, -1> (note the ...
Encyclical asked 21/12, 2021 at 13:16

2

Solved

How do I vectorize this C function with AVX2? static void propogate_neuron(const short a, const int8_t *b, int *c) { for (int i = 0; i < 32; ++i){ c[i] += a * b[i]; } }
Arhat asked 4/11, 2021 at 23:5

5

Solved

I'm wondering if there really is no 128-bit division intrinsic function in Visual C++? There is a 64x64=128 bit multiplication intrinsic function called _umul128(), which nicely matches the MUL x...
Camenae asked 9/12, 2011 at 23:50

2

Solved

From Agner Fog's "Optimizing software in C++": There is a problem when mixing code compiled with and without AVX support on some Intel processors. There is a performance penalty when goi...
Employment asked 11/8, 2021 at 5:40

3

Can someone recommend a fast way to add saturate 32-bit signed integers using Intel intrinsics (AVX, SSE4 ...) ? I looked at the intrinsics guide and found _mm256_adds_epi16 but this seems to onl...
Ostiole asked 7/4, 2015 at 18:41

2

Most C++ compilers support SIMD(SSE/AVX) instructions with intrisics like _mm_cmpeq_epi32 My problem with this is that this function is not marked as constexpr, although "semantically" there is...
Favourable asked 16/8, 2018 at 14:59

1

Solved

I am working on a C++ intrinsic wrapper for x64 and neon. I want my functions to be constexpr. My motivation is similar to Constexpr and SSE intrinsics, but #pragma omp simd and intrinsics may not ...
Incarnate asked 27/5, 2021 at 17:3

6

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instructi...
Brade asked 10/12, 2008 at 13:0

7

Solved

I'm involved in one of those challenges where you try to produce the smallest possible binary, so I'm building my program without the C or C++ run-time libraries (RTL). I don't link to the DLL vers...
Chimkent asked 30/5, 2010 at 14:16

3

Solved

Is there any way to convert the following code: int mask16 = 0b1010101010101010; // int or short, signed or unsigned, it does not matter to __uint128_t mask128 = ((__uint128_t)0x0100010001000100 &...
Wittol asked 21/4, 2021 at 18:19

2

I have this function to compute a piece of array of double's: void avx2_mul_64_block(double& sum, double* lhs_arr, double* rhs_arr) noexcept { __m256i accumulator = _mm256_setzero_pd(); for ...
Irresoluble asked 20/1, 2021 at 21:53

1

Solved

There is a relatively well-known trick for unsetting a single right-most bit: y = x & (x - 1) // 0b001011100 & 0b001011011 = 0b001011000 :) I'm finding myself with a tight loop to clear n ...
Bassorilievo asked 20/1, 2021 at 20:50

5

Solved

I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with: float vectorGetByIndex( __m128...
Dorsal asked 27/9, 2012 at 15:6

3

Solved

There are many methods in the Java API that are intrinsics, but still have code associated with them when looking at the source code. For an example, Integer.bitCount() is an intrinsic, but if you...
Drewdrewett asked 13/4, 2014 at 9:28

1

Solved

I am trying to get started with AVX512 intrinsics by reading the Intel Intrinsics Guide but so far I have found that it does not define the named datatypes or the pseudocode syntax used for explana...
Fusiform asked 12/6, 2020 at 17:22

0

I have been using the following "trick" in C code with SSE2 for single precision floats for a while now: static inline __m128 SSEI_m128shift(__m128 data) { return (__m128)_mm_srli_si128(_mm_castp...
Unclassical asked 23/5, 2020 at 11:52

1

I'm currently working on updating a large codebase from VS2013 to VS2019. One of the compiler errors I've run into is as follows: intrinsics.h(348): error C3861: '_mm_cvtpd_pi32': identifier not...
Epochmaking asked 30/3, 2020 at 15:4

1

Solved

Is there any difference between these functions? If not, why? __m128 __mm_set1_ps(float a) __m128 __mm_set_ps1(float a) Both descriptions are the same on the Intel Intrinsics Guide website. Than...
Gesticulation asked 29/3, 2020 at 23:58

3

Solved

How can I create a library that will dynamically switch between SSE, AVX, and AVX2 code paths depending on the host processor/OS? I am using Agner Fog's VCL (Vector Class Library) and compiling wit...
Cantonment asked 7/6, 2016 at 13:34

1

Solved

I am looking for an optimal method to calculate sum of all packed 32-bit integers in a __m256i or __m512i. To calculate sum of n elements, I ofter use log2(n) vpaddd and vpermd function, then extra...
Botha asked 7/2, 2020 at 7:8

1

Solved

I'm rewriting a code from AVX2 to AVX512. What's the equivalent I can use to broadcast a single float number to a _mm512 vector? In AVX2 it is _mm256_broadcast_ss() but I can't find something like...
Aggappe asked 17/1, 2020 at 14:22

1

Solved

Intel's intrinsic guide lists the intrinsic _mm256_loadu_epi32: _m256i _mm256_loadu_epi32 (void const* mem_addr); /* Instruction: vmovdqu32 ymm, m256 CPUID Flags: AVX512VL + AVX512F Description...
Rusel asked 8/1, 2020 at 15:43

2

Using C#'s Vector<T>, how can we most efficiently vectorize the operation of finding the index of a particular element in a set? As constraints, the set will always be a Span<T> of an ...
Keverne asked 9/7, 2019 at 14:59

1

Solved

The instruction exists (vbroadcastss zmm/m32) but there seems to be no intrinsic to generate it. I can code it as static inline __m512 mybroadcast(float *x) { __m512 v; asm inline ( "vbroadcas...
Windage asked 1/12, 2019 at 18:49

1

Solved

I am trying to do left rotation of a 128 bit number in AVX2. Since there is no direct method of doing this, I have tried using left shift and right shift to accomplish my task. Here is a snippet o...
Brewer asked 1/12, 2019 at 6:36

© 2022 - 2024 — McMap. All rights reserved.