simd Questions
3
Solved
Is this a bug in the VS 2017 watch, or am I doing something daft? It doesn't show half the contents of a Vector. (On my system, Vector.Count is 8).
[Test]
public void inspectVector()
{
var n...
Organzine asked 28/7, 2018 at 11:8
1
Solved
I have tried speeding up a toy GEMM implementation. I deal with blocks of 32x32 doubles for which I need an optimized MM kernel. I have access to AVX2 and FMA.
I have two codes (in ASM, I apologies...
Oke asked 13/12, 2021 at 20:48
1
I lately came across the term "wavefront" in the context of pixel shader execution on the graphics card.
From context I'd assume that a wavefront is a packing of multiple pixels or vertic...
3
Solved
I know that 'Nearest' method of image resizing is the fastest method.
Nevertheless I search way to speed up it.
Evident step is a precalculate indices:
void CalcIndex(int sizeS, int sizeD, int colo...
Pelagian asked 6/12, 2021 at 11:6
1
Solved
I have to do a large number of operations (additions) on relatively small integers, and I started considering which datatype would give the best performance on a 64 bit machine.
I was convinced tha...
Phototelegraph asked 27/11, 2021 at 10:39
3
Solved
It is known that GCC/CLang auto-vectorize loops well using SIMD instructions.
Also it is known that there exist alignas() standard C++ attribute, which among other uses also allows to align stack v...
Billups asked 20/11, 2021 at 12:9
5
I have large in-memory array as some pointer uint64_t * arr (plus size), which represents plain bits. I need to very efficiently (most performant/fast) shift these bits to the right by some amount ...
Durno asked 20/11, 2021 at 7:1
2
Solved
How do I vectorize this C function with AVX2?
static void propogate_neuron(const short a, const int8_t *b, int *c) {
for (int i = 0; i < 32; ++i){
c[i] += a * b[i];
}
}
Arhat asked 4/11, 2021 at 23:5
5
Solved
I need to implement a prefix sum algorithm and would need it to be as fast as possible.
Ex:
[3, 1, 7, 0, 4, 1, 6, 3]
should give:
[3, 4, 11, 11, 15, 16, 22, 25]
Is there a way to do this usin...
Phototherapy asked 14/5, 2012 at 16:44
2
Solved
I am trying to enable different simd support using MSVC.
There is a page talking about enabling some simd, such as SSE2, AVX, AVX2
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?red...
Landslide asked 24/9, 2020 at 19:59
7
Solved
I found this post that explains how to transpose an 8x8 bytes matrix with 24 operations, and a few scrolls later there's the code that implements the transpose. However, this method does not exploi...
Jibber asked 10/2, 2017 at 14:51
5
Solved
2
Solved
From Agner Fog's "Optimizing software in C++":
There is a problem when mixing code compiled with and without AVX support on some Intel
processors. There is a performance penalty when goi...
Employment asked 11/8, 2021 at 5:40
1
Solved
SSE has been around since 1999 and it and its following extensions are one of the most powerful tools for improving the performance of your C++ program. Yet there is no standardized containers/algo...
6
Solved
If you have an input array, and an output array, but you only want to write those elements which pass a certain condition, what would be the most efficient way to do this in AVX2?
I've seen in SSE ...
Aphrodisiac asked 29/4, 2016 at 7:30
2
Solved
I did searched on web and intel Software manual . But am unable to confirm if all Intel 64 architectures support upto SSSE3 or upto SSE4.1 or upto SSE4.2 or AVX etc. So that I would be able to use ...
Archil asked 28/1, 2015 at 6:14
2
Most C++ compilers support SIMD(SSE/AVX) instructions with intrisics like
_mm_cmpeq_epi32
My problem with this is that this function is not marked as constexpr, although "semantically" there is...
Favourable asked 16/8, 2018 at 14:59
3
Solved
I am porting SSE SIMD code to use the 256 bit AVX extensions and cannot seem to find any instruction that will blend/shuffle/move the high 128 bits and the low 128 bits.
The backing story:
What...
3
Solved
I wrote a function to add up all the elements of a double[] array using SIMD (System.Numerics.Vector) and the performance is worse than the naïve method.
On my computer Vector<double>.Count i...
Duvall asked 19/5, 2021 at 14:57
1
I have this C:
#include <stddef.h>
size_t findChar(unsigned int length, char* __attribute__((aligned(16))) restrict string) {
for (size_t i = 0; i < length; i += 2) {
if (string[i] == '[...
Cuprite asked 5/4, 2021 at 20:15
1
Solved
Originally I was trying to reproduce the effect described in Agner Fog's microarchitecture guide section "Warm-up period for YMM and ZMM vector instructions" where it says that:
The proc...
3
Solved
Given a number in a register (a binary integer), how to convert it to a string of hexadecimal ASCII digits? (i.e. serialize it into a text format.)
Digits can be stored in memory or printed on the...
3
I've implemented a method for parsing an unsigned integer string of length <= 8 using SIMD intrinsics available in .NET as follows:
public unsafe static uint ParseUint(string text)
{
fixed (cha...
Stephen asked 25/2, 2021 at 15:35
3
Solved
So, this AVX thing - it's like a small machine for each core? Or it's just like one engine-unit for whole CPU?
Like, can I use it on each core somehow? I'm playing with it, and I'm feeling like I m...
Matchmaker asked 20/2, 2021 at 18:25
1
I have an image processing algorithm to calculate a*b+c*d with AVX. The pseudo code is as follows:
float *a=new float[N];
float *b=new float[N];
float *c=new float[N];
float *d=new float[N];
//ass...
Engage asked 18/2, 2021 at 13:10
© 2022 - 2024 — McMap. All rights reserved.