Why is insertion sort Θ(n^2) in the average case?
Asked Answered
L

4

29

Insertion sort has a runtime that is Ω(n) (when the input is sorted) and O(n2) (when the input is reverse sorted). On average, it runs in Θ(n2) time.

Why is this? Why isn't the average case closer to O(n log n), for example?

Lafountain answered 11/6, 2013 at 23:12 Comment(0)
L
60

To answer this question, let's first determine how we can evaluate the runtime of insertion sort. If we can find a nice mathematical expression for the runtime, we can then manipulate that expression to determine the average runtime.

The key observation we need to have is that the runtime of insertion sort is closely related to the number of inversions in the input array. An inversion in an array is a pair of elements A[i] and A[j] that are in the wrong relative order - that is, i < j, but A[j] < A[i]. For example, in this array:

0 1 3 2 4 5

There is one inversion: the 3 and 2 should be switched. In this array:

4 1 0 3 2

There are 6 inversions:

  • 4 and 1
  • 4 and 0
  • 4 and 3
  • 4 and 2
  • 1 and 0
  • 3 and 2

One important property of inversions is that a sorted array has no inversions in it, since every element should be smaller than everything coming after it and larger than everything coming before it.

The reason this is significant is that there is a direct link between the amount of work done in insertion sort and the number of inversions in the original array. To see this, let's review some quick pseudocode for insertion sort:

  • For i = 2 .. n: (Assuming 1-indexing)
    • Set j = i - 1.
    • While A[j] > A[j + 1]:
      • Swap A[j] and A[j + 1].
      • Set j = j - 1.

Normally, when determining the total amount of work done by a function like this, we could determine the maximum amount of work done by the inner loop, then multiply it by the number of iterations of the outer loop. This will give an upper bound, but not necessarily a tight bound. A better way to account for the total work done is to recognize that there are two different sources of work:

  • The outer loop, which counts 2, 3, ..., n, and
  • The inner loop, which performs swaps.

That outer loop always does Θ(n) work. The inner loop, however, does an amount of work that's proportional to the total number of swaps made across the entire runtime of the algorithm. To see how much work that loop will do, we will need to determine how many total swaps are made across all iterations of the algorithm.

This is where inversions come in. Notice that when insertion sort runs, it always swaps adjacent elements in the array, and it only swaps the two elements if they form an inversion. So what happens to the total number of inversions in the array after we perform a swap? Well, graphically, we have this:

 [---- X ----] A[j] A[j+1] [---- Y ----]

Here, X is the part of the array coming before the swapped pair and Y is the part of the array coming after the swapped pair.

Let's suppose that we swap A[j] and A[j+1]. What happens to the number of inversions? Well, let's consider some arbitrary inversion between two elements. There are 6 possibilities:

  • Both elements are in X, or both elements are in Y, or one element is in X and one element is in Y. Then the inversion is still there, since we didn't move any of those elements.
  • One element is in X or Y and the other is either A[j] or A[j+1]. Then the inversion is still there, since the relative orderings of the elements haven't changed, even though their absolute positions might have.
  • One element is A[j] and the other A[j+1]. Then the inversion is removed after the swap.

This means that after performing a swap, we decrease the number of inversions by exactly one, because only the inversion of the adjacent pair has disappeared. This is hugely important for the following reason: If we start off with I inversions, each swap will decrease the number by exactly one. Once no inversions are left, no more swaps are performed. Therefore, the number of swaps equals the number of inversions!

Given this, we can accurately express the runtime of insertion sort as Θ(n + I), where I is the number of inversions of the original array. This matches our original runtime bounds - in a sorted array, there are 0 inversions, and the runtime is Θ(n + 0) = Θ(n), and in a reverse-sorted array, there are n(n - 1)/2 inversions, and the runtime is Θ(n + n(n-1)/2) = Θ(n2). Nifty!

So now we have a super precise way of analyzing the runtime of insertion sort given a particular array. Let's see how we can analyze its average runtime. To do this, we'll need to make an assumption about the distribution of the inputs. Since insertion sort is a comparison-based sorting algorithm, the actual values of the input array don't actually matter; only their relative ordering actually matters. In what follows, I'm going to assume that all the array elements are distinct, though if this isn't the case the analysis doesn't change all that much. I'll point out where things go off-script when we get there.

To solve this problem, we're going to introduce a bunch of indicator variables of the form Xij, where Xij is a random variable that is 1 if A[i] and A[j] form an inversion and 0 otherwise. There will be n(n - 1)/2 of these variables, one for each distinct pair of elements. Note that these variables account for each possible inversion in the array.

Given these X's, we can define a new random variable I that is equal to the total number of inversions in the array. This will be given by the sum of the X's:

I = Σ Xij

We're interested in E[I], the expected number of inversions in the array. Using linearity of expectation, this is

E[I] = E[Σ Xij] = Σ E[Xij]

So now if we can get the value of E[Xij], we can determine the expected number of inversions and, therefore, the expected runtime!

Fortunately, since all the Xij's are binary indicator variables, we have that

E[Xij] = Pr[Xij = 1] = Pr[A[i] and A[j] are an inversion]

So what's the probability, given a random input array with no duplicates, that A[i] and A[j] are an inversion? Well, half the time, A[i] will be less than A[j], and the other half of the time A[i] will be greater than A[j]. (If duplicates are allowed, there's a sneaky extra term to handle duplicates, but we'll ignore that for now). Consequently, the probability that there's an inversion between A[i] and A[j] is 1 / 2. Therefore:

E[I] = ΣE[Xij] = Σ (1 / 2)

Since there are n(n - 1)/2 terms in the sum, this works out to

E[I] = n(n - 1) / 4 = Θ(n2)

And so, on expectation, there will be Θ(n2) inversions, so on expectation the runtime will be Θ(n2 + n) = Θ(n2). This explains why the average-case behavior of insertion sort is Θ(n2).

Hope this helps!

Lafountain answered 11/6, 2013 at 23:12 Comment(10)
I may be wrong (and hopefully I am), but that sounds to me a bit more like bubble sort than insertion sort... I was of the opinion that insertion sort found the proper location for the current item and shifted the rest of the list down, instead of swapping elements around like that... Still, the analysis is fairly relevant, since bubble sort and insertion sort are pretty similar performance-wise.Engadine
@twalberg- Are you thinking of selection sort? I've always seen insertion sort defined this way.Lafountain
Possibly... I may have just read the pseudo-code wrong... Or possibly I'm thinking of an out-of-place insertion sort instead of in-place... I guess an in-place insertion sort would reasonably use swaps to move things around.Engadine
@twalberg: you can find the correct position for a value during which you exchange values or you can save the exchanges until right before the actual insertion and shift a bunch of values to make available the final position. Though the second form is faster, it still follows the n^2 number of comparisons - at least for the worst case.Stoner
The detail of the first part of this answer confuses me. Is it really true to say that the insertion sort can be seen as Θ(n + I) or that the best case is Θ(n + 0) , or indeed that the worst case is Θ(n + n(n-1)/2)? That doesn't seem to make sense - I can't see where the plus is coming from. The best case is Θ(n) * Θ(1) = Θ(n), no?. And the worst case is simply Θ(n(n-1)/2) surely? Saying Θ(n + n(n-1)/2) seems to be double counting the outer loop - n(n-1)/2 is the total iterations required for the whole array, not just one pass of the inner loop, so it already includes the outer n - right?Recondite
You mean movement instead of SWAP.Olia
Interesting solution, I think it would be better to say # of inversions provides a lower bound to the # of swaps rather than saying they are equal. Also, would it be correct to say E[X_{ij}]=1/2? I doubt the probability of an inversion of a pair would be independent wrt the probability of other pairs...Prognostication
@Prognostication Since we're looking for a precise value for the expected runtime, just giving a lower bound wouldn't necessarily be a good idea here. Also, yes, the expected value is definitely 1/2. The technique of using indicator random variables is so powerful precisely because linearity of expectation allows us to get values for each E[X_{ij}] independently of one another. It's a common technique in the analysis of algorithms. Note that the fact that E[X_{ij}] = 1/2 does not mean that all the X_{ij}s are independent - they aren't.Lafountain
@Lafountain How did you derive that there are n(n - 1)/2 inversions in a reverse-sorted array??Imidazole
@Imidazole It’s (n-1) + (n-2) + … + 3 + 2 + 1. The first array element pairs with all others to form an inversion. The second array element pairs with everything after it to form an inversion, etc.Lafountain
S
2

For fun I wrote a program which ran through all data combinations for a vector of size n counting comparisons and found that the best case is n-1 (all sorted) and the worst is (n*(n-1))/2.

Some results for different n:

  n min     ave     max ave/(min+max) ave/max

  2   1     1         1        0.5000
  3   2     2.667     3        0.5334
  4   3     4.917     6        0.5463
  5   4     7.717    10        0.5512
  6   5    11.050    15        0.5525
  7   6    14.907    21        0.5521
  8   7    19.282    28        0.5509
  9   8    24.171    36        0.5493
 10   9    29.571    45        0.5476
 11  10    35.480    55        0.5458
 12  11    41.897    66        0.5441

It seems the average value follows min closer than it does max.

EDIT: some additional values

 13  12    48.820    78        0.5424        
 14  13    56.248    91        0.5408

EDIT: value for 15

 15  14    64.182   105        0.5393

EDIT: selected higher values

 16  15    72.619   120        -       0.6052
 32  31   275.942   496        -       0.5563
 64  63  1034.772  1953        -       0.5294
128 127  4186.567  8128        -       0.5151
256 255 16569.876 32640        -       0.5077

I recently wrote a program to compute the average number of comparisons for insertion sort for higher values of n. From these I have drawn the conclusion that as n approaches infinity the average case approaches the worst case divided by two.

Stoner answered 13/12, 2013 at 10:45 Comment(6)
Look at the growth rate of the average runtime. Notice that it goes up by about a factor of four when the input size doubles. This means that it's quadratic and therefore tracks the max more closely than the min. I bet that if you got values for larger n, the gap between the min and the average would be much, much bigger.Lafountain
@Lafountain my mistake. Min and max stabilize at constant increases of 2x and 4x as n doubles. Looking at the ave data I conclude that it will stabilize in the vicinity of 3.7x.Stoner
Which comes out to n^1.888.Stoner
Have you computed this for larger n? The proof I have predicts quadratic growth and for small n the lower-order terms still contribute a lot.Lafountain
I'm planning when to run for n=15 which requires 15! (1.31 * 10 ^ 12) different combinations multiplied by an average of 64-65 comparisons each. My cpu runs at 2.67 GHz or 2.67 * 10 ^ 9 which means 31646 * some average number of clock cycles per comparison. At the very least, more than 24 hours execution time.Stoner
I say you're totally right. There is no way to get n^2 operations on an insertion sort. O(n*(n-1)/2) is the worst case.Mahayana
B
0

Most algorithms have average-case the same as worst-case. To see why this is, let's call O the worst-case and Ω the best-case. Presumably, O >= Ω as n goes to infinity. For most distributions, the average case is going to be close to the average of the best- and worst-case - that is, (O + Ω)/2 = O/2 + Ω/2. Since we don't care about coefficients, and O >= Ω, this is the same as O.

Obviously, this is an oversimplification. There are running time distributions that are skewed such that the assumption of the average-case being the average of the worst-case and the best-case is not valid*. But this should give you a decent intuition as to why this is.

*As mentioned by templatetypedef in the comments, some examples are quicksort/quickselect, BST lookup (unless you balance the tree), hash table lookup, and the simplex method.

Bagwig answered 12/6, 2013 at 2:12 Comment(2)
Many important algorithms have worst-case runtimes that are different from their average-case runtime: quicksort, quickselect, BST lookup, hash table lookup, and the simplex method come to mind.Lafountain
@Lafountain I was pretty sure there were some, but I was coming up with nothing. Thanks for the suggestions!Bagwig
L
0

Lets take a look into program : General Representation

         
Loop : j = 1 to n 
{
   temp = array[j]
   k = j - 1
   
   Loop until : ( k > 0 ) and ( array[k] > temp ) {
       array[k+1] = array[k]     // shifting one element at a time
       k = k - 1
   }

   array[k+1] = temp

}

Outer Loop : 1 - 2 - 3 - 4 - 5 - .... n

Inner Loop : for every single element there will be a inner loop of its own

Lets take an example of array : [ 3, 2, 9, 1, 2, 6, 5 ] ( average-case )

                 3  -  2  -  9  -  1  - ..... n
                       |     |     |          |
    no. of loop        1     0     3       (n + 1) / 2  

 (n+1)/2 -> (multiple cases. so, using median of all probabilities)

so, for every element that is ( n ) inside loop is running ( (n+1)/2 ) number of times

 -> n(n+1)/2 
 -> (n2 + n )/2
 -> n2 + n            // drop constants
 -> n2                // drop lower order terms

so that's why even for average case Time complexity is : O(n*n)

Note : All loops that grow proportionally to the input size have a linear time complexity O(n). If you loop through only half of the array, that’s still O(n). Remember that we drop the constants so 1/2 n => O(n).

Leprous answered 6/2, 2023 at 8:16 Comment(1)
To make this argument sufficiently rigorous you need to explain why each item would be, on average, shifted down by (n+1)/2 positions. The actual amount is lower than this because early items are likely to be swapped many fewer times than this.Lafountain

© 2022 - 2025 — McMap. All rights reserved.