There have been bank conflicts since the earliest vector processing CPUs from the 1960's
It's caused by interleaved memory or multi-channel memory access.
Interleaved memory access or MCMA solves the problem to slow RAM access, by phasing access to
each word of memory from different banks or via different channels. But there is a side effect, memory access from the same bank takes longer than accessing memory from the adjacent bank.
From Wikipedia on the 1980's Cray 2 http://en.wikipedia.org/wiki/Cray-2
"Main memory banks were arranged in quadrants to be accessed at the same time, allowing programmers to scatter their data across memory to gain higher parallelism. The downside to this approach is that the cost of setting up the scatter/gather unit in the foreground processor was fairly high. Stride conflicts corresponding to the number of memory banks suffered a performance penalty (latency) as occasionally happened in power-of-2 FFT-based algorithms. As the Cray 2 had a much larger memory than Cray 1's or X-MPs, this problem was easily rectified by adding an extra unused element to an array to spread the work out"