What is "vectorization"?
Asked Answered
S

9

325

Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop is vectorized" ?

Scenarist answered 14/9, 2009 at 15:7 Comment(2)
@geoffspear The link seems to have been moved to en.wikipedia.org/wiki/Array_programmingOrganize
What does vectorization mean?Rosser
S
367

Many CPUs have "vector" or "SIMD" instruction sets which apply the same operation simultaneously to two, four, or more pieces of data. Modern x86 chips have the SSE instructions, many PPC chips have the "Altivec" instructions, and even some ARM chips have a vector instruction set, called NEON.

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.

I chose 4 because it's what modern hardware is most likely to directly support for 32-bit floats or ints.


The difference between vectorization and loop unrolling: Consider the following very simple loop that adds the elements of two arrays and stores the results to a third array.

for (int i=0; i<16; ++i)
    C[i] = A[i] + B[i];

Unrolling this loop would transform it into something like this:

for (int i=0; i<16; i+=4) {
    C[i]   = A[i]   + B[i];
    C[i+1] = A[i+1] + B[i+1];
    C[i+2] = A[i+2] + B[i+2];
    C[i+3] = A[i+3] + B[i+3];
}

Vectorizing it, on the other hand, produces something like this:

for (int i=0; i<16; i+=4)
    addFourThingsAtOnceAndStoreResult(&C[i], &A[i], &B[i]);

Where "addFourThingsAtOnceAndStoreResult" is a placeholder for whatever intrinsic(s) your compiler uses to specify vector instructions.


Terminology:

Note that most modern ahead-of-time compilers are able to auto vectorize very simple loops like this, which can often be enabled via a compile option (on by default with full optimization in modern C and C++ compilers, like gcc -O3 -march=native). OpenMP #pragma omp simd is sometimes helpful to hint the compiler, especially for "reduction" loops like summing an FP array where vectorization requires pretending that FP math is associative.

More complex algorithms still require help from the programmer to generate good vector code; we call this manual vectorization, often with intrinsics like x86 _mm_add_ps that map to a single machine instruction as in SIMD prefix sum on Intel cpu or How to count character occurrences using SIMD. Or even use SIMD for short non-looping problems like Most insanely fastest way to convert 9 char digits into an int or unsigned int or How to convert a binary integer number to a hex string?

The term "vectorization" is also used to describe a higher level software transformation where you might just abstract away the loop altogether and just describe operating on arrays instead of the elements that comprise them. e.g. writing C = A + B in some language that allows that when those are arrays or matrices, unlike C or C++. In lower-level languages like that, you could describe calling BLAS or Eigen library functions instead of manually writing loops as a vectorized programming style. Some other answers on this question focus on that meaning of vectorization, and higher-level languages.

Shingle answered 14/9, 2009 at 15:12 Comment(11)
What's the difference between this and loop unwinding/unrolling?Demodulator
Isn't it true that a compiler would have an easier job auto-vectorizing the unrolled loop ?Epistaxis
@NikosAthanasiou: It's plausible, but generally speaking a compiler should be able to autovectorize either loop, as they are both quite simple.Shingle
@StephenCanon how can one check whether or not some lines have been vectorized? If one would use objdump, what would one look for in the output of objdump?Cashman
Is it correct to say that vectorizing is a part of OS or compiler design? I'm not sure if I've ever seen this pattern in high-level programming languages (Java, Python).Clearwing
@Shuklaswag: vectorization is something that compilers can do for you, but it's also something that programmers explicitly do themselves. The OS is not involved.Shingle
@Cashman SIMD instructions and registers should be present in the objdump. Example of vectorized addition assembly .Aldarcy
@Clearwing Because the computations described here are primitive as compared to what would be usually done in loop body of a high level language.Intercross
@JeremyPowell Correct me if I'm wrong, but I think they're two completely different things. Vectorization is a property of the CPU. Loop unrolling is just a name for unwrapping a loopCale
@JeremyPowell: SIMD vectorization can be described as unrolling, but then doing the 4 independent operations as one, using hardware instructions that do for example four single-precision FP adds at once, like x86 SSE addps. The necessary loop-control calculations are the same as for regular unrolling, but then you roll the body back up into chunks of the SIMD vector width. If the compiler does that for you, that's "auto-vectorization".Advisable
Expressing your code in terms of whole-array operations is also called "vectorization", but it's really a separate thing to SIMD auto-vectorization. Source-level vectorization (with whole-array operations) might make auto-vectorization easier, or might be essential to using SIMD at all in language like CPython + numpy where there's no way for a loop in the source code to get compiled into native machine code that uses SIMD instructions. But ahead-of-time compiled languages like C++ and Rust can afford to let the compiler spend time looking at each loop to see if it can auto-vectorize.Advisable
O
51

Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once.

From wikipedia:

Scalar approach:

for (i = 0; i < 1024; i++)
{
   C[i] = A[i]*B[i];
}

Vectorized approach:

for (i = 0; i < 1024; i+=4)
{
   C[i:i+3] = A[i:i+3]*B[i:i+3];
}
Oma answered 14/9, 2009 at 15:13 Comment(3)
isn't that in essence same as Scalar approach? Your syntax and loop advancing is different , but underneath you are still multiplying it 4 times. But somehow it will be faster probably the CPU has instructions that does some trick called Vectorization.Suggestible
Looks like I will answer my own question here. The syntax in the vectorization approach when the complier see that, it will translate it into optimized CPU instructions that multiplies vectors. Like SIMD.Suggestible
@mskw: That's pseudo-code, not actual syntax for a C vector extension. In real manually-vectorized code it would look like __m128 va = _mm_loadu_ps( A+i ) and so on, and _mm_mul_ps( va, vb ); and a store intrinsic. For a longer example using AVX2 to do something more complicated that a an ahead-of-time compiler wouldn't easily auto-vectorize, see How to count character occurrences using SIMDAdvisable
S
22

Vectorization is used greatly in scientific computing where huge chunks of data needs to be processed efficiently.

In real programming application , i know it's used in NUMPY(not sure of other else).

Numpy (package for scientific computing in python) , uses vectorization for speedy manipulation of n-dimensional array ,which generally is slower if done with in-built python options for handling arrays.

although tons of explanation are out there , HERE'S WHAT VECTORIZATION IS DEFINED AS IN NUMPY DOCUMENTATION PAGE

Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:

  1. vectorized code is more concise and easier to read

  2. fewer lines of code generally means fewer bugs

  3. the code more closely resembles standard mathematical notation (making it easier, typically, to correctly code mathematical constructs)

  4. vectorization results in more “Pythonic” code. Without vectorization, our code would be littered with inefficient and difficult to read for loops.

Shamanism answered 2/5, 2017 at 4:34 Comment(0)
T
15

It refers to a the ability to do single mathematical operation on a list -- or "vector" -- of numbers in a single step. You see it often with Fortran because that's associated with scientific computing, which is associated with supercomputing, where vectorized arithmetic first appeared. Nowadays almost all desktop CPUs offer some form of vectorized arithmetic, through technologies like Intel's SSE. GPUs also offer a form of vectorized arithmetic.

Thornberry answered 14/9, 2009 at 15:9 Comment(0)
B
15

Vectorization, in simple words, means optimizing the algorithm so that it can utilize SIMD instructions in the processors.

AVX, AVX2 and AVX512 are the instruction sets (intel) that perform same operation on multiple data in one instruction. for eg. AVX512 means you can operate on 16 integer values(4 bytes) at a time. What that means is that if you have vector of 16 integers and you want to double that value in each integers and then add 10 to it. You can either load values on to general register [a,b,c] 16 times and perform same operation or you can perform same operation by loading all 16 values on to SIMD registers [xmm,ymm] and perform the operation once. This lets speed up the computation of vector data.

In vectorization we use this to our advantage, by remodelling our data so that we can perform SIMD operations on it and speed up the program.

Only problem with vectorization is handling conditions. Because conditions branch the flow of execution. This can be handled by masking. By modelling the condition into an arithmetic operation. eg. if we want to add 10 to value if it is greater then 100. we can either.

if(x[i] > 100) x[i] += 10; // this will branch execution flow.

or we can model the condition into arithmetic operation creating a condition vector c,

c[i] = x[i] > 100; // storing the condition on masking vector
x[i] = x[i] + (c[i] & 10) // using mask

this is very trivial example though... thus, c is our masking vector which we use to perform binary operation based on its value. This avoid branching of execution flow and enables vectorization.

Vectorization is as important as Parallelization. Thus, we should make use of it as much possible. All modern days processors have SIMD instructions for heavy compute workloads. We can optimize our code to use these SIMD instructions using vectorization, this is similar to parrallelizing our code to run on multiple cores available on modern processors.

I would like to leave with the mention of OpenMP, which lets yo vectorize the code using pragmas. I consider it as a good starting point. Same can be said for OpenACC.

Brouhaha answered 20/12, 2018 at 17:5 Comment(0)
P
10

By Intel people I think is easy to grasp.

Vectorization is the process of converting an algorithm from operating on a single value at a time to operating on a set of values at one time. Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).

For example, a CPU with a 512 bit register could hold 16 32- bit single precision doubles and do a single calculation.

16 times faster than executing a single instruction at a time. Combine this with threading and multi-core CPUs leads to orders of magnitude performance gains.

Link https://software.intel.com/en-us/articles/vectorization-a-key-tool-to-improve-performance-on-modern-cpus

In Java there is a option to this be included in JDK 15 of 2020 or late at JDK 16 at 2021. See this official issue.

Pepillo answered 11/4, 2020 at 1:20 Comment(0)
C
2

hope you are well!

vectorization refers to all the techniques that convert scaler implementation, in which a single operation processes a single entity at a time to vector implementation in which a single operation processes multiple entities at the same time.

Vectorization refers to a technique with the help of which we optimize the code to work with huge chunks of data efficiently. application of vectorization seen in scientific applications like NumPy, pandas also you can use this technique while working with Matlab, image processing, NLP, and much more. Overall it optimizes the runtime and memory allocation of the program.

Hope you may get your answer!

Thank you. 🙂

Celibate answered 2/2, 2022 at 9:45 Comment(4)
while performing an operation on individual elements of an array which we call scaler coding... - If you're doing a scalar loop over the elements in a high-level language like Python, your code isn't vectorized. Vectorized code is the alternative, where iterating over elements only happens inside the optimized functions, not visible in your source. I assume you know that, but throwing in a definition of "scalar" coding in the middle of that sentence makes it sound like you're talking about compilers turning scalar loops into vector code.Advisable
(C/C++ compilers do auto-vectorize, but don't invent calls to library functions other than sometimes memcpy.)Advisable
Thank you for adding your comment but what I mean to say as simply as I can vectorization is refer to all the techniques that convert scaler implementation, in which single operation process single entity at a time to vector implementation in which single operation process multiple entities at the same time.Celibate
Right, that's correct. I'd recommend you edit your answer to actually say that, instead of sounding like you're saying that "performing an operation on individual elements of an array" magically turns into optimized operations using SIMD, threads, and/or native code (for languages that don't already compile to native code)Advisable
B
0

I would define vectorisation a feature of a given language where the responsibility on how to iterate over the elements of a certain collection can be delegated from the programmer (e.g. explicit loop of the elements) to some method provided by the language (e.g. implicit loop).

Now, why do we ever want to do that ?

  1. Code readeability. For some (but not all!) cases operating over the entire collection at once rather than to its elements is easier to read and quicker to code;
  2. Some interpreted languages (R, Python, Matlab.. but not Julia for example) are really slow in processing explicit loops. In these cases vectorisation uses under the hood compiled instructions for these "element order processing" and can be several orders of magnitude faster than processing each programmer-specified loop operation;
  3. Most modern CPUs (and, nowadays, GPUs) have build-in parallelization that is exploitable when we use the vectorisation method provided by the language rather than our self-implemented order of operations of the elements;
  4. In a similar way our programming language of choice will likely use for some vectorisation operations (e.g. matrix operations) software libraries (e.g. BLAS/LAPACK) that exploit multi-threading capabilities of the CPU, another form of parallel computation.

Note that for points 3 and 4 some languages (Julia notably) allow these hardware parallelizations to be exploited also using programmer-defined order processing (e.g. for loops), but this happens automatically and under the hood when using the vectorisation method provided by the language.

Now, while vectorisation has many advantages, sometimes an algorithm is more intuitively expressed using an explicit loop than vectorisation (where perhaps we need to resort to complex linear algebra operations, identity and diagonal matrices... all to retain our "vectorised" approach), and if using an explicit ordering form has no computational disadvantages, this one should be preferred.

Ballot answered 17/5, 2022 at 9:52 Comment(0)
U
-6

See the two answers above. I just wanted to add that the reason for wanting to do vectorization is that these operations can easily be performed in paraell by supercomputers and multi-processors, yielding a big performance gain. On single processor computers there will be no performance gain.

Udall answered 14/9, 2009 at 15:13 Comment(2)
"On single processor computers there will be no performance gain": not true. Most modern processors have (limited) hardware support for vectorization (SSE, Altivec. etc. as named by stephentyrone), which can give significant speedup when used.Goldfish
thanks, I forgot that parallelization can be done at that level as well.Udall

© 2022 - 2024 — McMap. All rights reserved.