simd Questions
2
I'm experimenting with a cross-platform SIMD library ala ecmascript_simd aka SIMD.js, and part of this is providing a few "horizontal" SIMD operations. In particular, the API that library offers in...
2
Solved
I need to find the index of the value that is X or more % below the last rolling maximum peak.
The peak is a rolling maximum of the elements in one array (highs), while the values are in another ar...
Dahliadahlstrom asked 7/2, 2021 at 14:41
2
I have this function to compute a piece of array of double's:
void avx2_mul_64_block(double& sum, double* lhs_arr, double* rhs_arr) noexcept
{
__m256i accumulator = _mm256_setzero_pd();
for ...
Irresoluble asked 20/1, 2021 at 21:53
1
Solved
I have been going through Intel Intrinsics and every function is working on integers or floats or double that are packed or unpacked or extended packed.
It seems like this question should be answer...
Incarnadine asked 29/10, 2020 at 23:21
5
Solved
I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with:
float vectorGetByIndex( __m128...
Dorsal asked 27/9, 2012 at 15:6
1
2
Solved
I have a __m256 value that holds random bits.
I would like to to "interpret" it, to obtain another __m256 that holds float
values in a uniform [0.0f, 1.0f] range.
Planning to do it using:...
Graz asked 31/12, 2020 at 8:20
1
Solved
In the class P below, the method test seems to return identically false:
import java.util.function.IntPredicate;
import java.util.stream.IntStream;
public class P implements IntPredicate {
privat...
Sleeve asked 21/12, 2020 at 14:51
5
Solved
3
I'm using AVX2 instructions in some C code.
The VPERMD instruction takes two 8-integer vectors a and idx and generates a third one, dst, by permuting a based on idx. This seems equivalent to dst[i...
2
Solved
This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON vers...
Yila asked 7/12, 2020 at 23:45
3
Solved
2
Solved
Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag tha...
Ocarina asked 14/8, 2015 at 4:26
3
Are SIMD instructions built for vector numerical calculations only? Or does it lend itself well to a class of string manipulation tasks like, writing rows of data to a text file where the order of ...
Athematic asked 23/11, 2020 at 19:24
2
Solved
I want to implement a really (really) fast Sobel operator for a ray-tracer a friend of me and I wrote (sources can be found here). What follows is what I figure out so far...
First, let assume the...
Branham asked 13/8, 2013 at 19:13
3
Solved
2
Solved
In the Advanced Vector Extensions (AVX) the compare instructions like _m256_cmp_ps, the last argument is a compare predicate.
The choices for the predicate overwhelm me.
They seem to be a tripple o...
1
Solved
I'm currently working as an author and contributor to SSIM.js and jest-image-snapshot on behalf of Not a Typical Agency. A lot of the work I'm performing results in the creation of new algorithms t...
Angelus asked 12/9, 2020 at 20:10
1
Solved
Say there are a lot of uint32s store in aligned memory uint32 *p, how to convert them to uint8s with simd?
I see there is _mm256_cvtepi32_epi8/vpmovdb but it belongs to avx512, and my cpu doesn't s...
2
Solved
I am trying to compile a C program using cmake which uses SIMD intrinsics. When I try to compile it, I get two errors
/usr/lib/gcc/x86_64-linux-gnu/5/include/smmintrin.h:326:1: error: inlining fai...
1
Solved
I am learning and playing with SIMD functions and wrote a simple program, that compares number of vector addition instruction it can run in 1 second compared with normal scalar addition. I found th...
Brace asked 11/8, 2020 at 14:59
2
I wrote some code and compiled it using gcc with the native architecture option.
Typically I can take this code and run it on an older computer that doesn't have AVX2 (only AVX), and it works fine...
1
I have an input vector of 16384 signed four bit integers. They are packed into 8192 Bytes. I need to interleave the values and unpack into signed 8 bit integers in two separate arrays.
a,b,c,d are ...
1
Solved
I like to run my code with floating point exceptions enabled.
I do this under Linux using:
feenableexcept( FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW );
So far so good.
The issue I am having, is that...
Cowberry asked 28/7, 2020 at 1:51
© 2022 - 2024 — McMap. All rights reserved.