intrinsics - McMap

2

Solved

Are there ARM intrinsics for add-with-carry in C?

Do there exist intrinsics for ARM C compilers to do add-with-carry operations, or is it necessary to use assembly language? On x86, there is _addcarry_u64 for add-with-carry. (There's also the new...

c arm intrinsics carryflag

Nobel asked 9/5, 2016 at 23:21

2

Solved

Divide 8-bit integers by 4 (or shift) using SSE

How can I divide 16 8-bit integers by 4 (or shift them 2 to the right) using SSE intrinsics?

c++x86 sse simd intrinsics

Betimes asked 9/1, 2017 at 19:32

1

Solved

AVX-512 BF16: load bf16 values directly instead of converting from fp32

On CPU's with AVX-512 and BF16 support, you can use the 512 bit vector registers to store 32 16 bit floats. I have found intrinsics to convert FP32 values to BF16 values (for example: _mm512_cvtne2...

c intrinsics avx512 half-precision-float

Elementary asked 2/5 at 13:42

1

Solved

C program compiled with gcc -msse2 contains AVX1 instructions

I adapted a function I found on SO for SSE2 and included it in my program. The function uses SSE2 intrinsics to calculate the leading zero count of each of the 8 x 16bit integers in the vector. Whe...

c assembly gcc header-files intrinsics

Boyar asked 31/12, 2023 at 12:40

1

Solved

.NET8 supports Vector512, but why doesn't Vector reach 512 bits?

My CPU is AMD Ryzen 7 7840H which supports AVX-512 instruction set. When I run the .NET8 program, the value of Vector512.IsHardwareAccelerated is true. But System.Numerics.Vector<T> is still ...

c#simd intrinsics avx512 .net-8.0

Complacence asked 19/11, 2023 at 4:40

4

How to get CPU brand information in ARM64?

In Windows X86, the CPU brand can be queried with cpuid intrinsic function. Here is a sample of the code: #include <stdio.h> #include <intrin.h> int main(void) { int cpubrand[4 * 3];...

c windows intrinsics arm64

Havens asked 8/3, 2020 at 14:59

4

Solved

Testing for builtins/intrinsics

I have some code that uses gcc intrinsics. I would like to include code in case the intrinsic is missing. How can I do this? #ifdef __builtin_ctzll does not work.

c gcc intrinsics

Sweltering asked 1/12, 2010 at 8:12

1

Solved

Why do compilers not coerce "n / 2.0" into "n * 0.5" if it's faster? [closed]

I have always assumed that num * 0.5f and num / 2.0f were equivalent, since I thought the compiler was smart enough to optimize the division out. So today I decided to test that theory, and w...

c++c compiler-optimization intrinsics

Badderlocks asked 20/1, 2023 at 18:48

1

Solved

Extracting edges of AVX2 16x16 bitmatrix

Is there a relatively cheap way to extract the four edges (rows 0 and 15, and columns 0 and 15) of a 16x16 bitmatrix stored in a __m256i into four 16b lanes of a __m256i? I don't care which lanes t...

c bit-manipulation intrinsics avx2

Urethrectomy asked 31/12, 2022 at 3:42

4

Solved

bitpack ascii string into 7-bit binary blob using SIMD

Related: bitpack ascii string into 7-bit binary blob using ARM-v8 Neon SIMD - same question specialized for AArch64 intrinsics. This question covers portable C and x86-64 intrinsics. I would like ...

c ascii simd sse intrinsics

Fronton asked 17/12, 2022 at 4:41

3

bitpack ascii string into 7-bit binary blob using ARM-v8 Neon SIMD

Following my x86 question, I would like to know how it is possible to vectorized efficiently the following code on Arm-v8: static inline uint64_t Compress8x7bit(uint64_t x) { x = ((x & 0x7F00...

simd arm64 intrinsics neon

Shake asked 19/12, 2022 at 5:14

2

Solved

does gcc's __builtin_cpu_supports check for OS support?

GCC compiler provides a set of builtins to test some processor features, like availability of certain instruction sets. But, according to this thread we also may know certain cpu features may be no...

c gcc simd intrinsics instruction-set

Cyte asked 8/2, 2018 at 4:31

1

Solved

Is there a way to force visual studio to generate aligned instructions from SSE intrinsics?

The _mm_load_ps() SSE intrinsic is defined as aligned, throwing exception if the address is not aligned. However, it seems visual studio generates unaligned read instead. Since not all compilers a...

visual-studio visual-c++sse intrinsics memory-alignment

Sipper asked 15/5, 2020 at 9:32

1

Solved

How do you handle indivisible vector lengths with SIMD intrinsics, array not a multiple of vector width?

I am currently learning how to work with SIMD intrinsics. I know that an AVX 256-bit vector can contain four doubles, eight floats, or eight 32-bit integers. How do we use AVX to process arrays tha...

c++vectorization simd intrinsics avx

Meingoldas asked 16/9, 2022 at 3:18

1

Solved

Shuffling a vector by number of bytes

Is there any way to left-shift (v{0} -> v{1}) a __m128i by n bytes, where n is only known at runtime? I'm currently restricted to AVX1 but if AVX2/512 makes this much easier I'm very interested....

c++x86 sse intrinsics avx

Placida asked 27/8, 2022 at 5:49

3

Way to effectively call _BitScanReverse or __builtin_clz in constexpr functions?

It seems that _BitScanReverse, despite being an intrinsic and not a real function, can't be called in a constexpr function in Visual C++. I'm aware that I can implement this operation myself in a m...

c++visual-c++g++constexpr intrinsics

Sunk asked 9/10, 2017 at 20:55

1

Solved

Cast array of wrapper structs to SIMD vector

Say I have a wrapper struct, serving as a phantom type. struct Wrapper { float value; } Is it legal to load an array of this struct directly into an SIMD intrinsic type such as __m256? For exampl...

c++language-lawyer undefined-behavior simd intrinsics

Muriate asked 27/6, 2022 at 21:14

3

Solved

RDRAND and RDSEED intrinsics on various compilers?

Does Intel C++ compiler and/or GCC support the following Intel intrinsics, like MSVC does since 2012 / 2013? #include <immintrin.h> // for the following intrinsics int _rdrand16_step(uint16_t...

c++gcc intrinsics icc rdrand

Wetzell asked 31/3, 2015 at 15:49

2

Solved

Is there an efficient way to get the first non-zero element in an SIMD register using SIMD intrinsics?

As the title reads, if a 256-bit SIMD register is: 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | How can I efficiently get the index of the first non-zero element (i.e. the index 2 of the first 1)? The most st...

x86 bit-manipulation simd intrinsics avx

Uraemia asked 14/10, 2016 at 0:1

4

Solved

Linking error when building without CRT, memcpy and memset intrinsic functions

I'm trying to build an application as tiny as possible, and in doing so I'm trying to avoid use of the CRT by using Win API calls instead of standard C/C++ calls. Unfortunately, I'm still getting a...

c++memcpy intrinsics crt memset

Brotherly asked 27/1, 2014 at 3:26

1

Solved

Manipulate vector register as float32x4_t C variable in ARM

I'm using inline assembly in ARM for a scientific application. In my assembly code, I have to (see note in the end) nominally indicate which vector registers I want to use. For example, in my code,...

c assembly inline-assembly arm64 intrinsics

Mansuetude asked 4/3, 2022 at 16:18

2

Solved

Efficient overflow-immune arithmetic mean in C/C++

The arithmetic mean of two unsigned integers is defined as: mean = (a+b)/2 Directly implementing this in C/C++ may overflow and produce a wrong result. A correct implementation would avoid this. O...

c++c optimization compiler-optimization intrinsics

Madriene asked 7/2, 2022 at 13:6

1

Solved

What are the names and meanings of the intrinsic vector element types, like epi64x or pi32?

The intel intrinsic functions have the subtype of the vector built into their names. For example, _mm_set1_ps is a ps, which is a packed single-precision aka. a float. Although the meaning of most ...

intel sse intrinsics sse2 mmx

Infix asked 30/1, 2022 at 4:35

1

Solved

Rust compiler not optimising lzcnt? (and similar functions)

What was done: This follows as a result of experimenting on Compiler Explorer as to ascertain the compiler's (rustc's) behaviour when it comes to the log2()/leading_zeros() and similar functions. I...

rust x86 bit-manipulation compiler-optimization intrinsics

Hooge asked 25/12, 2021 at 16:24

4

Solved

AVX2: BitScanReverse or CountLeadingZeros on 8 bit elements in AVX register

I would like to extract the index of the highest set bit in a 256 bit AVX register with 8 bit elements. I could neither find a bsr nor a clz implementation for this. For clz with 32 bit elements, t...

c++simd intrinsics avx avx2

Mummify asked 30/8, 2021 at 13:32

intrinsics Questions

Recommended topics

Hot tags