intrinsics Questions

2

Solved

Do there exist intrinsics for ARM C compilers to do add-with-carry operations, or is it necessary to use assembly language? On x86, there is _addcarry_u64 for add-with-carry. (There's also the new...
Nobel asked 9/5, 2016 at 23:21

2

Solved

How can I divide 16 8-bit integers by 4 (or shift them 2 to the right) using SSE intrinsics?
Betimes asked 9/1, 2017 at 19:32

1

Solved

On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats. I have found intrinsics to convert FP32 values to BF16 values (for example: _mm512_cvtne2...
Elementary asked 2/5 at 13:42

1

Solved

I adapted a function I found on SO for SSE2 and included it in my program. The function uses SSE2 intrinsics to calculate the leading zero count of each of the 8 x 16bit integers in the vector. Whe...
Boyar asked 31/12, 2023 at 12:40

1

Solved

My CPU is AMD Ryzen 7 7840H which supports AVX-512 instruction set. When I run the .NET8 program, the value of Vector512.IsHardwareAccelerated is true. But System.Numerics.Vector<T> is still ...
Complacence asked 19/11, 2023 at 4:40

4

In Windows X86, the CPU brand can be queried with cpuid intrinsic function. Here is a sample of the code: #include <stdio.h> #include <intrin.h> int main(void) { int cpubrand[4 * 3];...
Havens asked 8/3, 2020 at 14:59

4

Solved

I have some code that uses gcc intrinsics. I would like to include code in case the intrinsic is missing. How can I do this? #ifdef __builtin_ctzll does not work.
Sweltering asked 1/12, 2010 at 8:12

1

Solved

I have always assumed that num * 0.5f and num / 2.0f were equivalent, since I thought the compiler was smart enough to optimize the division out. So today I decided to test that theory, and w...
Badderlocks asked 20/1, 2023 at 18:48

1

Solved

Is there a relatively cheap way to extract the four edges (rows 0 and 15, and columns 0 and 15) of a 16x16 bitmatrix stored in a __m256i into four 16b lanes of a __m256i? I don't care which lanes t...
Urethrectomy asked 31/12, 2022 at 3:42

4

Solved

Related: bitpack ascii string into 7-bit binary blob using ARM-v8 Neon SIMD - same question specialized for AArch64 intrinsics. This question covers portable C and x86-64 intrinsics. I would like ...
Fronton asked 17/12, 2022 at 4:41

3

Following my x86 question, I would like to know how it is possible to vectorized efficiently the following code on Arm-v8: static inline uint64_t Compress8x7bit(uint64_t x) { x = ((x & 0x7F00...
Shake asked 19/12, 2022 at 5:14

2

Solved

GCC compiler provides a set of builtins to test some processor features, like availability of certain instruction sets. But, according to this thread we also may know certain cpu features may be no...
Cyte asked 8/2, 2018 at 4:31

1

Solved

The _mm_load_ps() SSE intrinsic is defined as aligned, throwing exception if the address is not aligned. However, it seems visual studio generates unaligned read instead. Since not all compilers a...

1

Solved

I am currently learning how to work with SIMD intrinsics. I know that an AVX 256-bit vector can contain four doubles, eight floats, or eight 32-bit integers. How do we use AVX to process arrays tha...
Meingoldas asked 16/9, 2022 at 3:18

1

Solved

Is there any way to left-shift (v{0} -> v{1}) a __m128i by n bytes, where n is only known at runtime? I'm currently restricted to AVX1 but if AVX2/512 makes this much easier I'm very interested....
Placida asked 27/8, 2022 at 5:49

3

It seems that _BitScanReverse, despite being an intrinsic and not a real function, can't be called in a constexpr function in Visual C++. I'm aware that I can implement this operation myself in a m...
Sunk asked 9/10, 2017 at 20:55

1

Solved

Say I have a wrapper struct, serving as a phantom type. struct Wrapper { float value; } Is it legal to load an array of this struct directly into an SIMD intrinsic type such as __m256? For exampl...
Muriate asked 27/6, 2022 at 21:14

3

Solved

Does Intel C++ compiler and/or GCC support the following Intel intrinsics, like MSVC does since 2012 / 2013? #include <immintrin.h> // for the following intrinsics int _rdrand16_step(uint16_t...
Wetzell asked 31/3, 2015 at 15:49

2

Solved

As the title reads, if a 256-bit SIMD register is: 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | How can I efficiently get the index of the first non-zero element (i.e. the index 2 of the first 1)? The most st...
Uraemia asked 14/10, 2016 at 0:1

4

Solved

I'm trying to build an application as tiny as possible, and in doing so I'm trying to avoid use of the CRT by using Win API calls instead of standard C/C++ calls. Unfortunately, I'm still getting a...
Brotherly asked 27/1, 2014 at 3:26

1

Solved

I'm using inline assembly in ARM for a scientific application. In my assembly code, I have to (see note in the end) nominally indicate which vector registers I want to use. For example, in my code,...
Mansuetude asked 4/3, 2022 at 16:18

2

Solved

The arithmetic mean of two unsigned integers is defined as: mean = (a+b)/2 Directly implementing this in C/C++ may overflow and produce a wrong result. A correct implementation would avoid this. O...
Madriene asked 7/2, 2022 at 13:6

1

Solved

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most ...
Infix asked 30/1, 2022 at 4:35

1

Solved

What was done: This follows as a result of experimenting on Compiler Explorer as to ascertain the compiler's (rustc's) behaviour when it comes to the log2()/leading_zeros() and similar functions. I...

4

Solved

I would like to extract the index of the highest set bit in a 256 bit AVX register with 8 bit elements. I could neither find a bsr nor a clz implementation for this. For clz with 32 bit elements, t...
Mummify asked 30/8, 2021 at 13:32

© 2022 - 2024 — McMap. All rights reserved.