I have no credit for this, I just formatted it for you and copied it from another source, and I hope it help you
[source: ECE 1754, Survey of Loop Transformation Techniques, Eric LaForest, March 19, 2010]
It is all about the distance between a two executive iterations, in the first the distance is 1 between one outer loop and inner loop, so there is a dependency between them.
Loop skewing does exactly what it says: it skews the execution of an inner loop
relative to an outer one. This is useful if the inner loop has a dependence on the outer loop which prevents it from running in parallel. For example, the following code has a dependency vector of {(1, 0),(0, 1)} .Neither loop can be parallelized
since they each carry a dependency. Simply interchanging the loops would merely
interchange the indices holding the dependencies, accomplishing nothing.
do i = 2, n-1
do j = 2, m-1
a[i,j] =
(a[i-1,j] + a[i,j-1] + a[i+1,j] + a[i,j+1]) / 4
end do
end do
Loop skewing is implemented by adding the index of the outer loop, times some
skewing factor f, to the bounds of the inner loop and subtracting the same value
from all the uses of the inner loop index. The subtraction keeps the indices within
the new loop bounds, preserving the correctness of the program. The effect on the
inner loop iterations is to shift their position in the array forwards by f relative
to the current outer loop, increasing the dependency distance to the outer loop
in the same manner. In other words, given a dependency vector (a, b), skewing
transforms it to (a, f a + b). Since this transformation preserves the lexicographic
order of the dependencies, it is always legal. Applying a skew factor of one to
the above inner loop yields the following code:
do i = 2, n-1
do j = 2+i, m-1+i
a[i,j-i] =
(a[i-1,j-i] + a[i,j-1-i] + a[i+1,j-i] + a[i,j+1-i]) / 4
end do
end do
This new code executes in the same manner, but with dependencies of {(1, 1),(0, 1)}. Both loops still carry a dependency. However, interchanging the loops at this point yields a dependence vector {(1, 0),(1, 1)}, as shown in the following code:
do j = 4, m+n-2
do i = max(2, j-m+1), min(n-1, j-2)
a[i,j-i] =
(a[i-1,j-i] + a[i,j-1-i] + a[i+1,j-i] + a[i,j+1-i]) / 4
end do
end do
The inner loop can now be parallelized since it has now no loop-carried dependency on j, and the dependency to i is carried by the outer loop.Note that
interchanging skewed loop bounds is no longer straightforward: each loop must
take into account the upper and lower bounds of the other loop.