A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2.
So far I have:
float* a = ...;
float* b = ...;
float d2 = 0;
for(int k = 0; k < N; ++k)
{
float d = a[k] - b[k];
d2 += d * d;
}
That seems to work fine, except that I have profiled my code and this is the bottleneck (more than 50% of time is spent just doing this).
I am using Visual Studio 2012, on Win 7, with these optimization options: /O2 /Oi /Ot /Oy-
.
My understanding is that VS2012 should auto-vectorize that loop (using SSE2).
However if I insert #pragma loop(no_vector)
in the code I don't get a noticable slow down, so I guess the loop is not being vectorized. The compiler confirms that with this message:
info C5002: loop not vectorized due to reason '1105'
My questions are:
- Is it possible to fix this code so that VS2012 can vectorize it?
- If not, would it make sense to try to vectorize the code myself?
- Can you recommend a web site for me to learn about SSE2 coding?
- Is there some value of N below which vectorization would be counter productive?
- What is
reason '1105'
?