simd Questions

2

I'm experimenting with a cross-platform SIMD library ala ecmascript_simd aka SIMD.js, and part of this is providing a few "horizontal" SIMD operations. In particular, the API that library offers in...
Konrad asked 3/7, 2015 at 1:40

2

Solved

I need to find the index of the value that is X or more % below the last rolling maximum peak. The peak is a rolling maximum of the elements in one array (highs), while the values are in another ar...
Dahliadahlstrom asked 7/2, 2021 at 14:41

2

I have this function to compute a piece of array of double's: void avx2_mul_64_block(double& sum, double* lhs_arr, double* rhs_arr) noexcept { __m256i accumulator = _mm256_setzero_pd(); for ...
Irresoluble asked 20/1, 2021 at 21:53

1

Solved

I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed. It seems like this question should be answer...
Incarnadine asked 29/10, 2020 at 23:21

5

Solved

I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with: float vectorGetByIndex( __m128...
Dorsal asked 27/9, 2012 at 15:6

1

This doesn t appears to work/compile void vec(size_t n) { typedef char v4si __attribute__((vector_size(n))); v4si t={1}; } Is there a proper way to declare this or is it unsupported?
Bevin asked 7/1, 2021 at 1:41

2

Solved

I have a __m256 value that holds random bits. I would like to to "interpret" it, to obtain another __m256 that holds float values in a uniform [0.0f, 1.0f] range. Planning to do it using:...
Graz asked 31/12, 2020 at 8:20

1

Solved

In the class P below, the method test seems to return identically false: import java.util.function.IntPredicate; import java.util.stream.IntStream; public class P implements IntPredicate { privat...
Sleeve asked 21/12, 2020 at 14:51

5

Solved

I would like to combine two __m128 values to one __m256. Something like this: __m128 a = _mm_set_ps(1, 2, 3, 4); __m128 b = _mm_set_ps(5, 6, 7, 8); to something like: __m256 c = { 1, 2, 3, 4,...
Godgiven asked 20/6, 2012 at 9:40

3

I'm using AVX2 instructions in some C code. The VPERMD instruction takes two 8-integer vectors a and idx and generates a third one, dst, by permuting a based on idx. This seems equivalent to dst[i...
Pandiculation asked 31/8, 2016 at 23:35

2

Solved

This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON vers...
Yila asked 7/12, 2020 at 23:45

3

Solved

PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask. How does one support this functionality on instructions sets predating sse4...
Smalltime asked 6/12, 2020 at 8:36

2

Solved

Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag tha...
Ocarina asked 14/8, 2015 at 4:26

3

Are SIMD instructions built for vector numerical calculations only? Or does it lend itself well to a class of string manipulation tasks like, writing rows of data to a text file where the order of ...
Athematic asked 23/11, 2020 at 19:24

2

Solved

I want to implement a really (really) fast Sobel operator for a ray-tracer a friend of me and I wrote (sources can be found here). What follows is what I figure out so far... First, let assume the...
Branham asked 13/8, 2013 at 19:13

3

Solved

I tried to change this code to handle std::vector<int>. float accumulate(const std::vector<float>& v) { // copy the length of v and a pointer to the data onto the local stack con...
Disqualification asked 7/10, 2015 at 11:32

2

Solved

In the Advanced Vector Extensions (AVX) the compare instructions like _m256_cmp_ps, the last argument is a compare predicate. The choices for the predicate overwhelm me. They seem to be a tripple o...
Aurangzeb asked 7/6, 2013 at 15:52

1

Solved

I'm currently working as an author and contributor to SSIM.js and jest-image-snapshot on behalf of Not a Typical Agency. A lot of the work I'm performing results in the creation of new algorithms t...
Angelus asked 12/9, 2020 at 20:10

1

Solved

Say there are a lot of uint32s store in aligned memory uint32 *p, how to convert them to uint8s with simd? I see there is _mm256_cvtepi32_epi8/vpmovdb but it belongs to avx512, and my cpu doesn't s...
Aryanize asked 7/9, 2020 at 9:14

2

Solved

I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors /usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining fai...
Sidonnie asked 30/3, 2017 at 21:29

1

Solved

I am learning and playing with SIMD functions and wrote a simple program, that compares number of vector addition instruction it can run in 1 second compared with normal scalar addition. I found th...
Brace asked 11/8, 2020 at 14:59

2

I wrote some code and compiled it using gcc with the native architecture option. Typically I can take this code and run it on an older computer that doesn't have AVX2 (only AVX), and it works fine...
Blab asked 6/5, 2019 at 15:13

1

I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays. a,b,c,d are ...
Plumley asked 31/7, 2020 at 22:55

1

Solved

I like to run my code with floating point exceptions enabled. I do this under Linux using: feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW ); So far so good. The issue I am having, is that...
Cowberry asked 28/7, 2020 at 1:51

1

What is the "correct" (i.e., portable) way in LLVM to load data from memory into a SIMD vector? Looking at the typical IR generated by LLVM's auto-vectorizer for an x86 target, it seems l...
Springclean asked 25/7, 2020 at 15:36

© 2022 - 2024 — McMap. All rights reserved.