simd - 5 - McMap

2

Optimizing horizontal boolean reduction in ARM NEON

I'm experimenting with a cross-platform SIMD library ala ecmascript_simd aka SIMD.js, and part of this is providing a few "horizontal" SIMD operations. In particular, the API that library offers in...

arm simd neon

Konrad asked 3/7, 2015 at 1:40

2

Solved

SIMD search for trough after the last peak

I need to find the index of the value that is X or more % below the last rolling maximum peak. The peak is a rolling maximum of the elements in one array (highs), while the values are in another ar...

c vectorization simd avx2

Dahliadahlstrom asked 7/2, 2021 at 14:41

2

Improving performance of floating-point dot-product of an array with SIMD

I have this function to compute a piece of array of double's: void avx2_mul_64_block(double& sum, double* lhs_arr, double* rhs_arr) noexcept { __m256i accumulator = _mm256_setzero_pd(); for ...

c++x86 simd intrinsics avx

Irresoluble asked 20/1, 2021 at 21:53

1

Solved

What is packed and unpacked and extended packed data

I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed. It seems like this question should be answer...

cpu-architecture sse simd avx avx2

Incarnadine asked 29/10, 2020 at 23:21

5

Solved

Get member of __m128 by index?

I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with: float vectorGetByIndex( __m128...

c++clang sse simd intrinsics

Dorsal asked 27/9, 2012 at 15:6

1

Do gcc vector extensions support variable length vectors?

This doesn t appears to work/compile void vec(size_t n) { typedef char v4si __attribute__((vector_size(n))); v4si t={1}; } Is there a proper way to declare this or is it unsupported?

c gcc vector simd c99

Bevin asked 7/1, 2021 at 1:41

2

Solved

Convert "__m256 with random-bits" into float values of [0, 1] range

I have a __m256 value that holds random bits. I would like to to "interpret" it, to obtain another __m256 that holds float values in a uniform [0.0f, 1.0f] range. Planning to do it using:...

c++random floating-point simd avx

Graz asked 31/12, 2020 at 8:20

1

Solved

IntStream leads to array elements being wrongly set to 0 (JVM Bug, Java 11)

In the class P below, the method test seems to return identically false: import java.util.function.IntPredicate; import java.util.stream.IntStream; public class P implements IntPredicate { privat...

java arrays java-stream simd java-11

Sleeve asked 21/12, 2020 at 14:51

5

Solved

How to combine two __m128 values to __m256?

I would like to combine two __m128 values to one __m256. Something like this: __m128 a = _mm_set_ps(1, 2, 3, 4); __m128 b = _mm_set_ps(5, 6, 7, 8); to something like: __m256 c = { 1, 2, 3, 4,...

c x86 sse simd avx

Godgiven asked 20/6, 2012 at 9:40

3

Converting from Source-based Indices to Destination-based Indices

I'm using AVX2 instructions in some C code. The VPERMD instruction takes two 8-integer vectors a and idx and generates a third one, dst, by permuting a based on idx. This seems equivalent to dst[i...

c math sse simd avx2

Pandiculation asked 31/8, 2016 at 23:35

2

Solved

What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?

This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON vers...

assembly arm simd webassembly neon

Yila asked 7/12, 2020 at 23:45

3

Solved

How to simulate pcmpgtq on sse2?

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4...

assembly sse simd sse2 sse4

Smalltime asked 6/12, 2020 at 8:36

2

Solved

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag tha...

c++sse compiler-optimization simd fast-math

Ocarina asked 14/8, 2015 at 4:26

3

Can I use SIMD for speeding up string manipulation?

Are SIMD instructions built for vector numerical calculations only? Or does it lend itself well to a class of string manipulation tasks like, writing rows of data to a text file where the order of ...

c++c string optimization simd

Athematic asked 23/11, 2020 at 19:24

2

Solved

Fast transposition of an image and Sobel Filter optimization in C (SIMD)

I want to implement a really (really) fast Sobel operator for a ray-tracer a friend of me and I wrote (sources can be found here). What follows is what I figure out so far... First, let assume the...

c optimization sse simd

Branham asked 13/8, 2013 at 19:13

3

Solved

Accumulate vector of integer with sse

I tried to change this code to handle std::vector<int>. float accumulate(const std::vector<float>& v) { // copy the length of v and a pointer to the data onto the local stack con...

c++vector x86 sse simd

Disqualification asked 7/10, 2015 at 11:32

2

Solved

How to choose AVX compare predicate variants

In the Advanced Vector Extensions (AVX) the compare instructions like _m256_cmp_ps, the last argument is a compare predicate. The choices for the predicate overwhelm me. They seem to be a tripple o...

simd avx

Aurangzeb asked 7/6, 2013 at 15:52

1

Solved

Is there any way to get Node.JS and V8 to automatically vectorize simple loops?

I'm currently working as an author and contributor to SSIM.js and jest-image-snapshot on behalf of Not a Typical Agency. A lot of the work I'm performing results in the creation of new algorithms t...

javascript node.js v8 simd webassembly

Angelus asked 12/9, 2020 at 20:10

1

Solved

how to convert uint32 to uint8 using simd but not avx512?

Say there are a lot of uint32s store in aligned memory uint32 *p, how to convert them to uint8s with simd? I see there is _mm256_cvtepi32_epi8/vpmovdb but it belongs to avx512, and my cpu doesn't s...

sse simd avx avx2

Aryanize asked 7/9, 2020 at 9:14

2

Solved

inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining fai...

c cmake x86 sse simd

Sidonnie asked 30/3, 2017 at 21:29

1

Solved

AVX2 simd performs relatively worse to scalar at higher optimization level

I am learning and playing with SIMD functions and wrote a simple program, that compares number of vector addition instruction it can run in 1 second compared with normal scalar addition. I found th...

c++performance gcc simd avx2

Brace asked 11/8, 2020 at 14:59

2

specify simd level of a function that compiler can use

I wrote some code and compiled it using gcc with the native architecture option. Typically I can take this code and run it on an older computer that doesn't have AVX2 (only AVX), and it works fine...

c gcc simd

Blab asked 6/5, 2019 at 15:13

1

Deinterleve vector of nibbles using SIMD

I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are ...

c++sse simd avx2

Plumley asked 31/7, 2020 at 22:55

1

Solved

How to avoid floating point exceptions in unused SIMD lanes

I like to run my code with floating point exceptions enabled. I do this under Linux using: feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW ); So far so good. The issue I am having, is that...

floating-point clang simd floating-point-exceptions sigfpe

Cowberry asked 28/7, 2020 at 1:51

1

SIMD vector memory load in LLVM

What is the "correct" (i.e., portable) way in LLVM to load data from memory into a SIMD vector? Looking at the typical IR generated by LLVM's auto-vectorizer for an x86 target, it seems l...

c++llvm simd llvm-ir avx

Springclean asked 25/7, 2020 at 15:36

simd Questions

Recommended topics

Hot tags