intrinsics - 3

2

Solved

Is casting to simd-type undefined behaviour in C++? [duplicate]

In a simd-tutorial i found the following code-snippet. void simd(float* a, int N) { // We assume N % 4 == 0. int nb_iters = N / 4; __m128* ptr = reinterpret_cast<__m128*>(a); // (...

c++sse undefined-behavior simd intrinsics

Affirmatory asked 18/11, 2019 at 8:59

1

Solved

Why do GCC atomic builtins need an additional "generic" version?

According to https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, there's: type __atomic_load_n (type *ptr, int memorder) and (the "generic"): void __atomic_load (type *ptr, type ...

c gcc intrinsics stdatomic

Aglitter asked 29/10, 2019 at 23:46

4

Solved

How to instruct compiler to generate unaligned loads for __m128

I've got some code that works with __m128 values. I'm using x86-64 SSE intrinsics on these values and I find that if the values are unaligned in memory I get a crash. This is due to my compiler (cl...

c++x86-64 sse simd intrinsics

Wealth asked 24/11, 2015 at 9:4

1

Solved

When is __m128 in an xmm register?

Calling _mm_load_ps returns an __m128. In the Intel intrinsics guide it says: Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into dst. mem_add...

c++compilation sse cpu-registers intrinsics

Glassman asked 17/10, 2019 at 19:36

2

AVX intrinsics for tiled matrix multiplication [closed]

I was trying to use AVX512 intrinsics to vectorize my loop of matrix multiplication (tiled). I used __mm256d as variables to store intermediate results and store them in my results. However, ...

c++matrix optimization intrinsics avx

Demarco asked 30/9, 2019 at 2:40

3

Solved

Horizontal add with __m512 (AVX512)

How does one efficiently perform horizontal addition with floats in a 512-bit AVX register (ie add the items from a single vector together)? For 128 and 256 bit registers this can be done using _mm...

simd intrinsics avx512

Hardej asked 12/11, 2014 at 20:58

3

Solved

Fast interleave 2 double arrays into an array of structs with 2 float and 1 int (loop invariant) member, with SIMD double->float conversion?

I have a section of code which is a bottleneck in a C++ application running on x86 processors, where we take double values from two arrays, cast to float and store in an array of structs. The reaso...

c++x86 simd intrinsics avx

Bindweed asked 12/7, 2019 at 20:22

3

Solved

What is the difference between Java intrinsic and native methods?

Java intrinsic functions are mentioned in various places (e.g. here). My understanding is that these are methods that handled with special native code. This seems similar to a JNI method which is a...

java native intrinsics

Kliment asked 21/6, 2019 at 7:28

2

Solved

error: '_mm512_loadu_epi64' was not declared in this scope

I'm trying to create a minimal reproducer for this issue report. There seems to be some problems with AVX-512, which is shipping on the latest Apple machines with Skylake processors. According to ...

c++gcc x86 intrinsics avx512

Tiflis asked 4/12, 2018 at 2:47

1

Solved

Different intrinsics behaviour depending on GCC version

I'm pretty new to intrinsics and i faced with different behavior of my code with GCC-7.4 and GCC-8.3 My code is pretty simple b.cpp: #include <iostream> #include <xmmintrin.h> void ...

c++gcc undefined-behavior intrinsics

Thymelaeaceous asked 10/6, 2019 at 8:47

2

Solved

How does JitIntrinsicAttribute affect code generation?

I was browsing .NET source code and saw this attribute. It says, An attribute that can be attached to JIT Intrinsic methods/properties and according to MSDN: Indicates that a modified metho...

c#.net mono clr intrinsics

Osteoma asked 13/11, 2014 at 7:58

3

Solved

How to store the contents of a __m128d simd vector as doubles without accessing it as a union?

The code i want to optimize is basically a simple but large arithmetic formula, it should be fairly simple to analyze the code automatically to compute the independent multiplications/additions in ...

c x86 simd intrinsics sse2

Acidify asked 19/9, 2012 at 13:13

2

Solved

Does Clang have something like #pragma GCC target?

I have some code written that uses AVX intrinsics when they are available on the current CPU. In GCC and Clang, unlike Visual C++, in order to use intrinsics, you must enable them on the command li...

clang intrinsics avx pragma

Manion asked 11/9, 2017 at 23:37

2

Solved

What does the [Intrinsic] attribute in C# do?

A quick Google search for "instrinsic attribute c#" only returns articles about other attributes, such as [Serializable]. Apparently these are called "intrinsic attributes". However, there is als...

c#.net .net-core intrinsics

Norvol asked 31/5, 2019 at 4:27

1

Solved

__m256i version of _mm_test_all_zeros

I know how to test if an _m128i register is all zero with the _mm_test_all_zeros intrinsic. What is the AVX2 / __m256i version of this intrinsic? If one isn't available, what is the fastest way to...

simd intrinsics avx avx2

Redo asked 28/5, 2019 at 16:24

1

Solved

Compare two __m128i values for total order

I need a way to compare values of type __m128i in C++ for a total order between any values of type __m128i. The type of order doesn't matter as long as it establishes a total order between all valu...

c++x86 x86-64 simd intrinsics

Glossal asked 28/5, 2019 at 11:39

1

Solved

What is the difference between _mm_movehdup_ps and _mm_shuffle_ps in this case?

If my understanding is correct, _mm_movehdup_ps(a) gives the same result as _mm_shuffle_ps(a, a, _MM_SHUFFLE(1, 1, 3, 3))? Is there a performance difference the two?

x86 sse intrinsics micro-optimization sse3

Awash asked 21/5, 2019 at 12:21

1

AVX2 SIMD XOR not yielding performance improvements in .NET

I'm playing around with .NET Core 3.0's new support for hardware intrinsics, in the System.Runtime.Intrinsics namespace. I have some code where I perform 4 XOR operations in a loop - below is a si...

c#.net-core simd intrinsics .net-core-3.0

Workmanship asked 8/5, 2019 at 8:52

2

Solved

SSE: shuffle (permutevar) 4x32 integers

I have some code using the AVX2 intrinsic _mm256_permutevar8x32_epi32 aka vpermd to select integers from an input vector by an index vector. Now I need the same thing but for 4x32 instead of 8x32. ...

sse simd intrinsics avx

Windom asked 8/5, 2019 at 3:58

1

Multiply 64-bit integers using .NET Core's hardware intrinsics

I'm writing some performance sensitive code, where multiplication of unsigned 64-bit integers (ulong) is a bottleneck. .NET Core 3.0 beings access to hardware intrinsics with the System.Runtime.In...

c#math .net-core intrinsics .net-core-3.0

Perfection asked 7/5, 2019 at 9:23

2

Solved

The Effect of Architecture When Using SSE / AVX Intrinisics

I wonder how does a Compiler treats Intrinsics. If one uses SSE2 Intrinsics (Using #include <emmintrin.h>) and compile with -mavx flag. What will the compiler generate? Will it generate AVX ...

gcc sse intrinsics avx icc

Dehumidifier asked 18/4, 2019 at 14:6

1

Solved

Summing 8-bit integers in __m512i with AVX intrinsics

AVX512 provide us with intrinsics to sum all cells in a __mm512 vector. However, some of their counterparts are missing: there is no _mm512_reduce_add_epi8, yet. _mm512_reduce_add_ps //horizontal ...

c x86 simd intrinsics avx

Alidus asked 22/3, 2019 at 9:41

1

Solved

SIMD: Accumulate Adjacent Pairs

I'm learning how to use SIMD intrinsics and autovectorization. Luckily, I have a useful project I'm working on that seems extremely amenable to SIMD, but is still tricky for a newbie like me. I'm...

c++sse simd intrinsics avx

Salina asked 8/3, 2019 at 6:36

1

AVX __m256i integer division for signed 32-bit elements

I am trying to do a SIMD division in an AVX machine and getting a compilation error. Here is my code: __m256i help; int arr[8]; int arr2[8]; help = _mm256_load_si256((__m256i*)arr); __m256i ...

c++simd intrinsics avx

Jacqui asked 26/2, 2019 at 16:19

1

Solved

Why does _mm_mfence() produce counts for the ALL_LOADS perf event?

I am testing some of intrinsic operations' behaviors. I got surprised when I noticed that _mm_mfence() issues load instruction from user space, but it does not count in L1 data cache - miss, hit or...

c x86 intrinsics perf papi

Biestings asked 25/2, 2019 at 23:36

intrinsics Questions

Recommended topics

Hot tags