Exactly how many comparisons does merge sort make?
Asked Answered
M

6

12

I have read that quicksort is much faster than mergesort in practice, and the reason for this is the hidden constant.

Well, the solution for the randomized quick sort complexity is 2nlnn=1.39nlogn which means that the constant in quicksort is 1.39.

But what about mergesort? What is the constant in mergesort?

Mortmain answered 16/12, 2011 at 14:25 Comment(2)
This question doesn't have an answer without any more details. The answer is depndent on (1) your definition of complexity: number of ops? number of comparisons? (2) the answer may differ between different machines, depending on the instruction set of each machine.Illinois
The number of comparisons of course.Mortmain
W
23

Let's see if we can work this out!

In merge sort, at each level of the recursion, we do the following:

  1. Split the array in half.
  2. Recursively sort each half.
  3. Use the merge algorithm to combine the two halves together.

So how many comparisons are done at each step? Well, the divide step doesn't make any comparisons; it just splits the array in half. Step 2 doesn't (directly) make any comparisons; all comparisons are done by recursive calls. In step 3, we have two arrays of size n/2 and need to merge them. This requires at most n comparisons, since each step of the merge algorithm does a comparison and then consumes some array element, so we can't do more than n comparisons.

Combining this together, we get the following recurrence:

C(1) = 0
C(n) = 2C(n / 2) + n

(As mentioned in the comments, the linear term is more precisely (n - 1), though this doesn’t change the overall conclusion. We’ll use the above recurrence as an upper bound.)

To simplify this, let's define n = 2k and rewrite this recurrence in terms of k:

C'(0) = 0
C'(k) = 2C'(k - 1) + 2^k

The first few terms here are 0, 2, 8, 24, ... . This looks something like k 2k, and we can prove this by induction. As our base case, when k = 0, the first term is 0, and the value of k 2k is also 0. For the inductive step, assume the claim holds for some k and consider k + 1. Then the value is 2(k 2k) + 2k + 1 = k 2 k + 1 + 2k + 1 = (k + 1)2k + 1, so the claim holds for k + 1, completing the induction. Thus the value of C'(k) is k 2k. Since n = 2 k, this means that, assuming that n is a perfect power of two, we have that the number of comparisons made is

C(n) = n lg n

Impressively, this is better than quicksort! So why on earth is quicksort faster than merge sort? This has to do with other factors that have nothing to do with the number of comparisons made. Primarily, since quicksort works in place while merge sort works out of place, the locality of reference is not nearly as good in merge sort as it is in quicksort. This is such a huge factor that quicksort ends up being much, much better than merge sort in practice, since the cost of a cache miss is pretty huge. Additionally, the time required to sort an array doesn't just take the number of comparisons into account. Other factors like the number of times each array element is moved can also be important. For example, in merge sort we need to allocate space for the buffered elements, move the elements so that they can be merged, then merge back into the array. These moves aren't counted in our analysis, but they definitely add up. Compare this to quicksort's partitioning step, which moves each array element exactly once and stays within the original array. These extra factors, not the number of comparisons made, dominate the algorithm's runtime.

This analysis is a bit less precise than the optimal one, but Wikipedia confirms that the analysis is roughly n lg n and that this is indeed fewer comparisons than quicksort's average case.

Hope this helps!

Walkway answered 16/12, 2011 at 19:37 Comment(4)
Thank you very much! Is there any analysis that takes the space allocation into acount?Mortmain
Shouldn't the formula be C(1) = 0 C(n) = 2C(n / 2) + n-1. Since if we have 2 arrays of size n/2 we need at most n-1 compares to merge them into an array of size n?Abrahamabrahams
@Johnson Yes! That’s a great point. That will end up making the overall analysis off by 2n - 1 (one per recursive call), which I believe doesn’t change the conclusion. Thanks for sporting that!Walkway
shouldn't the number of comparison in merging be (n-1)?Albanese
M
5

In the worst case and assuming a straight-forward implementation, the number of comparisons to sort n elements is

n ⌈lg n⌉ − 2⌈lg n + 1

where lg n indicates the base-2 logarithm of n.

This result can be found in the corresponding Wikipedia article or recent editions of The Art of Computer Programming by Donald Knuth, and I just wrote down a proof for this answer.

Magnificence answered 10/9, 2012 at 13:44 Comment(1)
Any Idea for quicksort?Gryphon
A
2

Merging two sorted arrays (or lists) of size k resp. m takes k+m-1 comparisons at most, min{k,m} at best. (After each comparison, we can write one value to the target, when one of the two is exhausted, no more comparisons are necessary.)

Let C(n) be the worst case number of comparisons for a mergesort of an array (a list) of n elements.

Then we have C(1) = 0, C(2) = 1, pretty obviously. Further, we have the recurrence

C(n) = C(floor(n/2)) + C(ceiling(n/2)) + (n-1)

An easy induction shows

C(n) <= n*log_2 n

On the other hand, it's easy to see that we can come arbitrarily close to the bound (for every ε > 0, we can construct cases needing more than (1-ε)*n*log_2 n comparisons), so the constant for mergesort is 1.

Aileneaileron answered 16/12, 2011 at 19:36 Comment(2)
Then it means that my 1.39 constant for quicksort is not correct.Mortmain
@geniaz1- Your constant for quicksort is indeed correct, but quicksort is faster for other reasons. See my post for details.Walkway
L
0

Merge sort is O(n log n) and at each step, in the "worst" case (for number of comparisons), performs a comparison.

Quicksort, on the other hand, is O(n^2) in the worst case.

Lalalalage answered 16/12, 2011 at 22:12 Comment(0)
P
0

C++ program to count the number of comparisons in merge sort. First the program will sort the given array, then it will show the number of comparisons.

 #include<iostream>
 using namespace std;
 int  count=0; /* to count the number of comparisions */

 int merge( int arr [ ], int l, int m, int r)
{
 int i=l; /* left subarray*/
 int j=m+1; /* right  subarray*/
 int k=l; /* temporary array*/
 int temp[r+1];
 while( i<=m && j<=r)
 {
   if ( arr[i]<= arr[j])
  {
    temp[k]=arr[i];
    i++;
  }
   else
  {
    temp[k]=arr[j];
    j++;
  }
    k++;
    count++;

  }
   while( i<=m)
  {
    temp[k]=arr[i];
    i++;
    k++;
  }
    while( j<=r)
  {
    temp[k]=arr[j];
    j++;
    k++;
  }
  for( int p=l; p<=r; p++)
  {
    arr[p]=temp[p];
  }
   return count;
  }


  int  mergesort( int arr[ ], int l, int r)
  {
    int comparisons;
    if(l<r)
  {
   int m= ( l+r)/2;
   mergesort(arr,l,m);
   mergesort(arr,m+1,r);
   comparisions = merge(arr,l,m,r);
  }
   return comparisons;
  }

 int main ()
 {
   int size;
   cout<<" Enter the size of an array "<< endl;
   cin>>size;
   int myarr[size];
   cout<<"  Enter the elements of array "<<endl;
   for ( int i=0; i< size; i++)
 {
   cin>>myarr[i];
 }
 cout<<"  Elements of array before sorting are  "<<endl;
 for ( int i=0; i< size; i++)
 {
   cout<<myarr[i]<<"  " ;
 }
  cout<<endl;
  int c=mergesort(myarr, 0, size-1);
  cout<<"  Elements of array after sorting are  "<<endl;
  for ( int i=0; i< size; i++)
 {
   cout<<myarr[i]<<"  " ;
 }
   cout<<endl;
   cout<<"  Number of comaprisions while sorting the given array"<< c <<endl;
   return 0;
 }
Plagal answered 15/1, 2021 at 15:2 Comment(0)
H
0

I am assuming reader knows Merge sort. Comparisons happens only when two sorted arrays is getting merged. For simplicity, assume n as power of 2. To merge two (n/2) size arrays in worst case, we need (n - 1) comparisons. -1 appears here, as last element left on merging does not require any comparison. First found number of total comparison assuming it as n for some time, we can correct it by (-1) part. Number of levels for merging is log2(n) (Imagine as tree structure). In each layer there will be n comparison (need to minus some number, due to -1 part),so total comparison is nlog2(n) - (Yet to be found). "Yet to be found" part does not give nlog2(n) constant, it is actually (1 + 2 + 4 + 8 + ... + (n/2) = n - 1). Number of total comparison in merge sort = n*log2(n) - (n - 1). So, your constant is 1.

Hallucinate answered 5/7, 2021 at 18:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.