intrinsics - 2

2

C# - Construct a signal Vector<T> from an integer bitmask

I have some integer value representing a bitmask, for example 154 = 0b10011010, and I want to construct a corresponding signal Vector<T> instance <0, -1, 0, -1, -1, 0, 0, -1> (note the ...

c#vector simd intrinsics bitmask

Encyclical asked 21/12, 2021 at 13:16

2

Solved

How to vectorise int8 multiplcation in C (AVX2)

How do I vectorize this C function with AVX2? static void propogate_neuron(const short a, const int8_t *b, int *c) { for (int i = 0; i < 32; ++i){ c[i] += a * b[i]; } }

c x86 simd intrinsics avx2

Arhat asked 4/11, 2021 at 23:5

5

Solved

128-bit division intrinsic in Visual C++

I'm wondering if there really is no 128-bit division intrinsic function in Visual C++? There is a 64x64=128 bit multiplication intrinsic function called _umul128(), which nicely matches the MUL x...

visual-c++intrinsics integer-division 128-bit

Camenae asked 9/12, 2011 at 23:50

2

Solved

Do I need to use _mm256_zeroupper in 2021?

From Agner Fog's "Optimizing software in C++": There is a problem when mixing code compiled with and without AVX support on some Intel processors. There is a performance penalty when goi...

c++sse simd intrinsics avx

Employment asked 11/8, 2021 at 5:40

3

Add saturate 32-bit signed ints intrinsics?

Can someone recommend a fast way to add saturate 32-bit signed integers using Intel intrinsics (AVX, SSE4 ...) ? I looked at the intrinsics guide and found _mm256_adds_epi16 but this seems to onl...

x86 sse intrinsics avx saturation-arithmetic

Ostiole asked 7/4, 2015 at 18:41

2

Constexpr and SSE intrinsics

Most C++ compilers support SIMD(SSE/AVX) instructions with intrisics like _mm_cmpeq_epi32 My problem with this is that this function is not marked as constexpr, although "semantically" there is...

c++sse simd constexpr intrinsics

Favourable asked 16/8, 2018 at 14:59

1

Solved

How to combine constexpr and vectorized code?

I am working on a C++ intrinsic wrapper for x64 and neon. I want my functions to be constexpr. My motivation is similar to Constexpr and SSE intrinsics, but #pragma omp simd and intrinsics may not ...

c++openmp constexpr intrinsics

Incarnate asked 27/5, 2021 at 17:3

6

How to use MSVC intrinsics to get the equivalent of this GCC code?

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instructi...

c visual-c++intrinsics

Brade asked 10/12, 2008 at 13:0

7

Solved

How to use VC++ intrinsic functions w/o run-time library

I'm involved in one of those challenges where you try to produce the smallest possible binary, so I'm building my program without the C or C++ run-time libraries (RTL). I don't link to the DLL vers...

c++visual-c++intrinsics memset demoscene

Chimkent asked 30/5, 2010 at 14:16

3

Solved

Convert 16 bits mask to 16 bytes mask

Is there any way to convert the following code: int mask16 = 0b1010101010101010; // int or short, signed or unsigned, it does not matter to __uint128_t mask128 = ((__uint128_t)0x0100010001000100 &...

c++c bit-manipulation sse intrinsics

Wittol asked 21/4, 2021 at 18:19

2

Improving performance of floating-point dot-product of an array with SIMD

I have this function to compute a piece of array of double's: void avx2_mul_64_block(double& sum, double* lhs_arr, double* rhs_arr) noexcept { __m256i accumulator = _mm256_setzero_pd(); for ...

c++x86 simd intrinsics avx

Irresoluble asked 20/1, 2021 at 21:53

1

Solved

How to unset N right-most set bits

There is a relatively well-known trick for unsetting a single right-most bit: y = x & (x - 1) // 0b001011100 & 0b001011011 = 0b001011000 :) I'm finding myself with a tight loop to clear n ...

bit-manipulation intrinsics integer-arithmetic

Bassorilievo asked 20/1, 2021 at 20:50

5

Solved

Get member of __m128 by index?

I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with: float vectorGetByIndex( __m128...

c++clang sse simd intrinsics

Dorsal asked 27/9, 2012 at 15:6

3

Solved

Why do java intrinsic functions still have code?

There are many methods in the Java API that are intrinsics, but still have code associated with them when looking at the source code. For an example, Integer.bitCount() is an intrinsic, but if you...

java intrinsics

Drewdrewett asked 13/4, 2014 at 9:28

1

Solved

How to read the "Intel Intrinsics Guide"?

I am trying to get started with AVX512 intrinsics by reading the Intel Intrinsics Guide but so far I have found that it does not define the named datatypes or the pseudocode syntax used for explana...

intel simd intrinsics

Fusiform asked 12/6, 2020 at 17:22

0

Left-shift (of float32 array) with AVX2 and filling up with a zero

I have been using the following "trick" in C code with SSE2 for single precision floats for a while now: static inline __m128 SSEI_m128shift(__m128 data) { return (__m128)_mm_srli_si128(_mm_castp...

c gcc intrinsics avx2 sse2

Unclassical asked 23/5, 2020 at 11:52

1

MMX intrinsics like _mm_cvtpd_pi32 not found with MSVC 2019 for 64bit targets; change from 2013?

I'm currently working on updating a large codebase from VS2013 to VS2019. One of the compiler errors I've run into is as follows: intrinsics.h(348): error C3861: '_mm_cvtpd_pi32': identifier not...

visual-c++x86-64 visual-studio-2019 intrinsics mmx

Epochmaking asked 30/3, 2020 at 15:4

1

Solved

What is the difference between _mm_set1_ps and _mm_set_ps1?

Is there any difference between these functions? If not, why? __m128 __mm_set1_ps(float a) __m128 __mm_set_ps1(float a) Both descriptions are the same on the Intel Intrinsics Guide website. Than...

c sse intrinsics

Gesticulation asked 29/3, 2020 at 23:58

3

Solved

Compile multi-architecture code using Agner's Vector Class Library

How can I create a library that will dynamically switch between SSE, AVX, and AVX2 code paths depending on the host processor/OS? I am using Agner Fog's VCL (Vector Class Library) and compiling wit...

c++vectorization intrinsics avx vector-class-library

Cantonment asked 7/6, 2016 at 13:34

1

Solved

Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2

I am looking for an optimal method to calculate sum of all packed 32-bit integers in a __m256i or __m512i. To calculate sum of n elements, I ofter use log2(n) vpaddd and vpermd function, then extra...

c intrinsics avx avx2 avx512

Botha asked 7/2, 2020 at 7:8

1

Solved

c++ AVX512 intrinsic equivalent of _mm256_broadcast_ss()?

I'm rewriting a code from AVX2 to AVX512. What's the equivalent I can use to broadcast a single float number to a _mm512 vector? In AVX2 it is _mm256_broadcast_ss() but I can't find something like...

c++intel intrinsics avx2 avx512

Aggappe asked 17/1, 2020 at 14:22

1

Solved

How to emulate _mm256_loadu_epi32 with gcc or clang?

Intel's intrinsic guide lists the intrinsic _mm256_loadu_epi32: _m256i _mm256_loadu_epi32 (void const* mem_addr); /* Instruction: vmovdqu32 ymm, m256 CPUID Flags: AVX512VL + AVX512F Description...

c++c intrinsics avx512

Rusel asked 8/1, 2020 at 15:43

2

Use C# Vector<T> SIMD to find index of matching element

Using C#'s Vector<T>, how can we most efficiently vectorize the operation of finding the index of a particular element in a set? As constraints, the set will always be a Span<T> of an ...

c#vectorization simd intrinsics dot-product

Keverne asked 9/7, 2019 at 14:59

1

Solved

Is there an x86 intrinsic that generates the AVX512 broadcast operation from a 32 bit floating point value in memory to a 512 bit register?

The instruction exists (vbroadcastss zmm/m32) but there seems to be no intrinsic to generate it. I can code it as static inline __m512 mybroadcast(float *x) { __m512 v; asm inline ( "vbroadcas...

c intrinsics avx512

Windage asked 1/12, 2019 at 18:49

1

Solved

left shift of 128 bit number using AVX2 instruction

I am trying to do left rotation of a 128 bit number in AVX2. Since there is no direct method of doing this, I have tried using left shift and right shift to accomplish my task. Here is a snippet o...

c++simd intrinsics avx avx2

Brewer asked 1/12, 2019 at 6:36

intrinsics Questions

Recommended topics

Hot tags