auto-vectorization Questions

1

Solved

The following code produces assembly that conditionally executes SIMD in GCC 12.3 when compiled with -O3. For completeness, the code always executes SIMD in GCC 13.2 and never executes SIMD in clan...
Calyptra asked 16/2, 2024 at 22:17

1

Solved

I've tried to write a few functions to carry out matrix-vector multiplication using a single matrix together with an array of source vectors. I've once written those functions in C++ and once in x8...
Ermeena asked 21/1, 2024 at 0:13

1

Solved

Let's say I have a struct Foo s. t. struct alignas(64) Foo { std::atomic<int> value; Foo* other; }; Then, if I have an array Foo array[2048]; of Foo's: I already have initialized the array...
Unalterable asked 24/11, 2023 at 12:14

1

Solved

Consider the following valarray-like class: #include <stdlib.h> struct va { void add1(const va& other); void add2(const va& other); size_t* data; size_t size; }; void va::add1(...

1

Solved

I have the following Java code (all arrays are initialized before we call "arrays" and all are of size "arraySize") int arraySize = 64; float[] a; float[] b; float[] result; p...
Ultann asked 8/7, 2023 at 17:19

2

Solved

I wanted to explore auto-vectorization by gcc (10.3). I have the following short program (see https://godbolt.org/z/5v9a53aj6) which computes the sum of all elements of a vector: #include <stdio...
Tobacco asked 21/10, 2022 at 10:12

0

I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels. I quickly realized that due to lack of accelerated integer division support I need to convert my values ...
Cai asked 14/3, 2023 at 11:37

1

Solved

Example: https://www.godbolt.org/z/ahfcaj7W8 From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html It says -ftree-loop-vectorize      Perform loop vectorization on trees. This f...
Propitious asked 23/12, 2022 at 10:30

9

Solved

Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorizatio...
Scenarist asked 14/9, 2009 at 15:7

1

Solved

I'm trying to understand JAX's auto-vectorization capabilities using vmap and implemented a minimal working example based on JAX's documentation. I don't understand how in_axes is used correctly. I...
Perinephrium asked 3/1, 2022 at 10:29

3

Solved

I'm using Codeblocks for a C program on Windows 7. The program is using the OMP library. GCC version is 4.9.2. Mingw x86_64-w64-mingw32-gcc-4.9.2.exe. Flags used are: -fopenmp -O3 -mfpmath=sse -fu...
Mislay asked 17/11, 2015 at 14:7

1

Solved

I know that "why is my compiler doing this" aren't the best type of questions, but this one is really bizarre to me and I'm thoroughly confused. I had thought that std::min() was the same...

1

Solved

I am trying to vectorize this for loop. After using the Rpass flag, I am getting the following remark for it: int someOuterVariable = 0; for (unsigned int i = 7; i != -1; i--) { array[someOuterVa...
Pottage asked 12/1, 2021 at 8:41

2

Solved

If I take this code #include <cmath> void compute_sqrt(const double* x, double* y, int n) { int i; #pragma omp simd linear(i) for (i=0; i<n; ++i) { y[i] = std::sqrt(x[i]); } } and co...
Braswell asked 23/8, 2020 at 0:4

2

Solved

I have a filter m_f which acts on an input vector v via Real d2v = m_f[0]*v[i]; for (size_t j = 1; j < m_f.size(); ++j) { d2v += m_f[j] * (v[i + j] + v[i - j]); } perf tells us where this lo...
Vicinal asked 27/1, 2019 at 3:25

1

The "problem" I see with just using an autovectorizer to convert user-written loop-code to SIMD-instructions on every compilation as part of usual optimizations is that if you change the compiler, ...
Nicolis asked 22/1, 2019 at 15:26

1

Solved

I'm attempting to make a function SIMD-enabled and vectorize the loop with a function call. #include <cmath> #pragma omp declare simd double BlackBoxFunction(const double x) { return 1.0/s...
Endeavor asked 11/1, 2019 at 12:27

1

Solved

In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says: note: not vectorized: control flow in lo...
Eulau asked 8/11, 2018 at 14:6

2

Solved

Consider the following toy example, where A is an n x 2 matrix stored in column-major order and I want to compute its column sum. sum_0 only computes sum of the 1st column, while sum_1 does the 2nd...
Myrtie asked 3/7, 2018 at 10:24

1

I see people using -msse -msse2 -mfpmath=sse flags by default hoping that this will improve performance. I know that SSE gets engaged when special vector types are used in the C code. But do these ...
Vocation asked 10/6, 2018 at 17:25

2

Solved

My understanding is that vectorization of code works something like this: For data in array bellow the first address in the array that is the multiple of 128(or 256 or whatever SIMD instructions r...
Magnification asked 17/5, 2018 at 22:21

2

Solved

Consider this minimal implementation of a fixed vector<int>: constexpr std::size_t capacity = 1000; struct vec { int values[capacity]; std::size_t _size = 0; std::size_t size() const ...
Reikoreilly asked 13/2, 2018 at 13:17

1

Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that? #include <vector> typedef std::vector<double> Vec;...
Nikaniki asked 8/5, 2015 at 21:21

1

Solved

I have this piece of code which segfaults when run on Ubuntu 14.04 on an AMD64 compatible CPU: #include <inttypes.h> #include <stdlib.h> #include <sys/mman.h> int main() { uin...
Unclose asked 27/11, 2017 at 12:15

1

Solved

C++17 adds extensions for parallelism to the standard library (e.g. std::sort(std::execution::par_unseq, arr, arr + 1000), which will allow the sort to be done with multiple threads and with vector...

© 2022 - 2025 — McMap. All rights reserved.