simd Questions

3

Solved

Is this a bug in the VS 2017 watch, or am I doing something daft? It doesn't show half the contents of a Vector. (On my system, Vector.Count is 8). [Test] public void inspectVector() { var n...
Organzine asked 28/7, 2018 at 11:8

1

Solved

I have tried speeding up a toy GEMM implementation. I deal with blocks of 32x32 doubles for which I need an optimized MM kernel. I have access to AVX2 and FMA. I have two codes (in ASM, I apologies...

1

I lately came across the term "wavefront" in the context of pixel shader execution on the graphics card. From context I'd assume that a wavefront is a packing of multiple pixels or vertic...
Eshelman asked 6/12, 2021 at 11:7

3

Solved

I know that 'Nearest' method of image resizing is the fastest method. Nevertheless I search way to speed up it. Evident step is a precalculate indices: void CalcIndex(int sizeS, int sizeD, int colo...
Pelagian asked 6/12, 2021 at 11:6

1

Solved

I have to do a large number of operations (additions) on relatively small integers, and I started considering which datatype would give the best performance on a 64 bit machine. I was convinced tha...
Phototelegraph asked 27/11, 2021 at 10:39

3

Solved

It is known that GCC/CLang auto-vectorize loops well using SIMD instructions. Also it is known that there exist alignas() standard C++ attribute, which among other uses also allows to align stack v...
Billups asked 20/11, 2021 at 12:9

5

I have large in-memory array as some pointer uint64_t * arr (plus size), which represents plain bits. I need to very efficiently (most performant/fast) shift these bits to the right by some amount ...
Durno asked 20/11, 2021 at 7:1

2

Solved

How do I vectorize this C function with AVX2? static void propogate_neuron(const short a, const int8_t *b, int *c) { for (int i = 0; i < 32; ++i){ c[i] += a * b[i]; } }
Arhat asked 4/11, 2021 at 23:5

5

Solved

I need to implement a prefix sum algorithm and would need it to be as fast as possible. Ex: [3, 1, 7, 0, 4, 1, 6, 3] should give: [3, 4, 11, 11, 15, 16, 22, 25] Is there a way to do this usin...
Phototherapy asked 14/5, 2012 at 16:44

2

Solved

I am trying to enable different simd support using MSVC. There is a page talking about enabling some simd, such as SSE2, AVX, AVX2 https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?red...
Landslide asked 24/9, 2020 at 19:59

7

Solved

I found this post that explains how to transpose an 8x8 bytes matrix with 24 operations, and a few scrolls later there's the code that implements the transpose. However, this method does not exploi...
Jibber asked 10/2, 2017 at 14:51

5

Solved

I want to learn more about using the SSE. What ways are there to learn, besides the obvious reading the Intel® 64 and IA-32 Architectures Software Developer's Manuals? Mainly I'm interested to wo...
Unsnap asked 7/9, 2009 at 14:42

2

Solved

From Agner Fog's "Optimizing software in C++": There is a problem when mixing code compiled with and without AVX support on some Intel processors. There is a performance penalty when goi...
Employment asked 11/8, 2021 at 5:40

1

Solved

SSE has been around since 1999 and it and its following extensions are one of the most powerful tools for improving the performance of your C++ program. Yet there is no standardized containers/algo...
Circuit asked 17/12, 2019 at 12:3

6

Solved

If you have an input array, and an output array, but you only want to write those elements which pass a certain condition, what would be the most efficient way to do this in AVX2? I've seen in SSE ...
Aphrodisiac asked 29/4, 2016 at 7:30

2

Solved

I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto SSSE3 or upto SSE4.1 or upto SSE4.2 or AVX etc. So that I would be able to use ...
Archil asked 28/1, 2015 at 6:14

2

Most C++ compilers support SIMD(SSE/AVX) instructions with intrisics like _mm_cmpeq_epi32 My problem with this is that this function is not marked as constexpr, although "semantically" there is...
Favourable asked 16/8, 2018 at 14:59

3

Solved

I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that will blend/shuffle/move the high 128 bits and the low 128 bits. The backing story: What...
Photodynamics asked 26/8, 2011 at 20:8

3

Solved

I wrote a function to add up all the elements of a double[] array using SIMD (System.Numerics.Vector) and the performance is worse than the naïve method. On my computer Vector<double>.Count i...
Duvall asked 19/5, 2021 at 14:57

1

I have this C: #include <stddef.h> size_t findChar(unsigned int length, char* __attribute__((aligned(16))) restrict string) { for (size_t i = 0; i < length; i += 2) { if (string[i] == '[...
Cuprite asked 5/4, 2021 at 20:15

1

Solved

Originally I was trying to reproduce the effect described in Agner Fog's microarchitecture guide section "Warm-up period for YMM and ZMM vector instructions" where it says that: The proc...
Triviality asked 30/3, 2021 at 15:43

3

Solved

Given a number in a register (a binary integer), how to convert it to a string of hexadecimal ASCII digits? (i.e. serialize it into a text format.) Digits can be stored in memory or printed on the...
Retentivity asked 17/12, 2018 at 22:14

3

I've implemented a method for parsing an unsigned integer string of length <= 8 using SIMD intrinsics available in .NET as follows: public unsafe static uint ParseUint(string text) { fixed (cha...
Stephen asked 25/2, 2021 at 15:35

3

Solved

So, this AVX thing - it's like a small machine for each core? Or it's just like one engine-unit for whole CPU? Like, can I use it on each core somehow? I'm playing with it, and I'm feeling like I m...
Matchmaker asked 20/2, 2021 at 18:25

1

I have an image processing algorithm to calculate a*b+c*d with AVX. The pseudo code is as follows: float *a=new float[N]; float *b=new float[N]; float *c=new float[N]; float *d=new float[N]; //ass...
Engage asked 18/2, 2021 at 13:10

© 2022 - 2024 — McMap. All rights reserved.