Is a big loop within a small loop always faster than a small loop within a big one?
Asked Answered
G

7

10

I just read this post, and wonder if we can draw the conclusion that a big loop within a small loop must always run faster than a small loop within a big one, no matter what the code does inside the nested loop? Take an example.

int m, n; 
m = 1000000;
n = 10;

Snippet A

for (int i = 0; i < n; i++)         
    for (int j=0; j < m; j++)               
       {       
           DoSomething();        
       }

   

Snippet B

for (int j = 0; j < m; j++)               
    for (int i=0; i < n; i++)           
       {       
          DoSomething();          
       }

   

Can we say that, no matter what DoSomething() actually does, snippet A always runs faster thant snippet B?


As pointed out by @stackmate, I want to expand this question into two

  1. When the code inside nested loop is DoSomething() which means DoSomething() has nothing to do with variable i and j. What is the performance difference?

  2. When the code inside nested loop is DoSomething(i, j) which means DoSomething(i, j) has relateship with variable i and j. What is the performance difference?

Gravois answered 28/5, 2014 at 14:24 Comment(2)
Snippet B could possibly have more branch mis-predictions, so the processor pipeline would have to be flushed more often. I would imagine that a c++ compiler would be able to optimize this, so you probably wouldn't have to worry about it.Elseelset
Unless DoSomething does AlmostNothing there is so little difference that there are almost certainly bigger issues hiding elsewhere in the code, grinning, while you spend your brain on this.Lidda
V
8

There cannot be a specific answer to your question. The parameter deciding whether it will be fast or not is what you are doing inside the loops. For example say you are adding 2 arrays and storing them in a third array:

Code 1:
for(int i = 0; i < 1000; i++)
{
    for(int j = 0; j < 1000000; j++)
         C[i][j] = A[i][j] + B[i][j];
}

Code 2:
for(int i = 0; i < 1000000; i++)
{
    for(int j = 0; j < 1000; j++)
         C[j][i] = A[j][i] + B[j][i];
}

Code 1 will be much faster than code 2. The reason is cache. Take a look at this question for more details. The answers are superbly informative and there is no point in me explaining the concept of cache again over here.

Vulcanize answered 28/5, 2014 at 14:34 Comment(4)
what you say is true, but in this specific question there are no parameters passed to DoSomething(), therefore your answer, while instructive is not really relevant here.Landsturm
Well its the way two-dimensional arrays are represented in memory generally. This is called column-row major ordering. But yes it definitely depends on what you do in the loop.Until
@stackmate: I think it's implied that the loop indices are used in DoSomething(). Otherwise there would be no need to use a nested loop in the first place.Exfoliate
@RetoKoradi yes I agree with you. stackmate there is no point in using 2 for loops if you are simply going to call a function without passing any args. Might as well run it in 1 loop.Vulcanize
E
2

@Cool_Coder already covered one main reason (memory access patterns resulting in better cache hit rates) why having the smaller loop as the inside loop can be beneficial.

Another scenario is that the inner loop could be unrolled. Particularly if the size of the smaller loop is really small and fixed, the compiler will unroll the inner loop if it's beneficial. The resulting code will then have only one loop instead of two nested loops, with a reduction in branches.

If you have a situation like this in highly performance critical code, you need to try both, and benchmark carefully. If the code is not very performance critical, it's probably not worth worrying about.

Exfoliate answered 28/5, 2014 at 14:46 Comment(0)
C
1

An observation. If you have read operations inside the nested loop like the following

for (int i = 0; i < n; i++)
    a = aList.get(i);         
    for (int j=0; j < m; j++)               
       {   
           b = bList.get(j)
           DoSomething(a, b);        
       }

then having n < m results in less someList.get-operations than n > m. With n=1 and m=2 there will be three read operations while with n=2 and m=1 there will be four read operations. In the case n > m there will be more repititive read operations. Or put another way the runtime is n + n*m. The value n*m stays the same whether n > m or n < m but the first addend changes.

(I assumed no compiler optimizations and ignored caching behavior)

Crysta answered 24/6, 2015 at 8:26 Comment(0)
H
0

It depends.

If you do anything other between the for loops then the one iterating m first will take longer to execute, simply because it does more work because of that extra added work before entering the next loop.

If not, then

n * m = m * n

Andy T's comment is also correct. When you have the outer loop be the longest one, then the initialization of the inner loop happens more often. If this is c++ though, I'd expect the compiler to optimize this away so that benchmarking your code would probably yield the same result for both loops.

Hallie answered 28/5, 2014 at 14:28 Comment(0)
R
0

In Snippet A: n=10 times switch to outer For loop But in Snippet B: m=1000000 times switch to outer For loop. And this reason (More switches between two For loop) makes to Snippet A is faster than Snippet B.

Routine answered 28/5, 2014 at 14:53 Comment(0)
I
0
for(int i = 0; i < 1000; i++)
{
        for(int j = 0;

in above code if you see we are initializing j 1000 times.

for(int i = 0; i < 1000000; i++)
{
    for(int j = 0;

while in this second code we are initializing j 1000000 times which clearly shows that second code has extra overhead.

Inharmonious answered 28/9, 2016 at 5:22 Comment(0)
I
0

I think there's a good deal of optimization going on here. Check out this jsben.ch: https://jsben.ch/FjXuL

var big = 10000;
var little = 10;

var x = 0;
// Little loop in Big loop
for (var i = 0; i < big; i++) for (var j=0; j < little; j++) { x++; }
// Big loop in Little loop
for (var i = 0; i < little; i++) for (var j=0; j < big; j++) { x++; }
// Little loop in Big loop (single initializations)
var j;

for (var i = 0; i < big; i++) for (j=0; j < little; j++) { x++; }
// Big loop in Little loop (single initializations)
var j;

for (var i = 0; i < little; i++) for (j=0; j < big; j++) { x++; }

When tested individually, Little loop in Big loop always comes out ahead in Chrome. Declaring j outside of the first loop in the "single initialization" tests doesn't seem to help - and actually slows down the Little loop in Big loop (single initialization) test significantly.

That being said, results could vary greatly by browser implementation / compiler.

I think it makes logical sense that nesting the bigger loop would be advantageous because you would be initializing the nested j variable less times, but apparently in Chrome this isn't a big factor in performance because nesting the smaller loop is faster.

Intelligent answered 22/2, 2022 at 3:53 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.