I'm merge sorting int arrays of length 10,000 to 75,000. I am getting strange sort times. Why might this be?
Asked Answered
M

1

6

I am doing an assignment for my algorithm class and I have to demonstrate that internal merge sort has a time complexity of O(n log n). To do this I made arrays ranging in length from 10,000 elements long to 75,000 elements long. Then I load the array with random numbers below 10,000, and output the array length to the console with how long it took to sort.

The strange result is that some take ~15 milliseconds give or take and then others take 0 milliseconds, even if the array is tens of thousands of integers longer. Any idea of why this may be? I can upload a screen shot of my output, but someone needs to approve it because I don't have enough "reputation". I have checked the arrays. They do appear to be sorted after calling the mergeSort() method.

public static void main(String[] args){
    int[] dataLength = { 10_000, 15_000, 20_000, 25_000, 30_000,
                         35_000, 40_000, 45_000, 50_000, 55_000,
                         60_000, 65_000, 70_000, 75_000 };
    // internal merge sort
    for (int i = 0; i < dataLength.length; i++) {
        int[] input = new int[dataLength[i]];
        for (int j = 0; j < input.length; j++) {
            input[j] = random.nextInt(10000);
        }

        long startTime = System.currentTimeMillis();
        mergeSort(input, 0, input.length);
        long endTime = System.currentTimeMillis();

        System.out.println("Length of array is: " + input.length + ". The sorted array: " +
                           Arrays.toString(input));
        System.out.println("Internal sort with " + dataLength[i] + " items took: " +
                           (endTime - startTime) + " milliseconds");
    }
}

public static void mergeSort(int[] input, int start, int end) {
    if (end - start < 2) {
        return;
    }

    int mid = (start + end) / 2;
    mergeSort(input, start, mid);
    mergeSort(input, mid, end);
    merge(input, start, mid, end);
    return;
}

public static void merge(int[] input, int start, int mid, int end) {
    if (input[mid - 1] <= input[mid]) {
        return;
    }
 //        index of "left" array
    int i = start;
 //        index of "right" array
    int j = mid;
    int tempIndex = 0;
    int[] temp = new int[end - start];

    while (i < mid && j < end) {
        temp[tempIndex++] = input[i] <= input[j] ? input[i++] : input[j++];
    }
 //        optimizes the merge. If there are elements in the left array we just copy them back
 //        into the input array instead of merging them with the temp array first
    System.arraycopy(input, i, input, start + tempIndex, mid - i);
    System.arraycopy(temp, 0, input, start, tempIndex);
    return;
}
Matthewmatthews answered 14/12, 2019 at 17:2 Comment(9)
Probably just HotSpot warmup time. I suggest using JMH for performance testing.Gradin
It would be immensely useful if you showed that actual results you get in, for instance, a table form, where you also show how ordered the arrays are before you run the sort, which require writing a small function that tests how many elements are out of order to begin with. Because a 75,000 item array that already mostly sorted will sort much quicker than a 10,000 item array that's entirely out of order, but until you verify that, you're asking people to guess. And guesses aren't answers.Belinda
Yeah I mentioned I did that. I can write the code and upload the screenshot of the output if you approve it.Matthewmatthews
And Tom Hawtin, I'm just a student so I don't know what the HotSpot is exactly. Could you explain a little bit more?Matthewmatthews
I think @TomHawtin-tackline is referring to the startup cost of the HotSpot JVM. You don't want that time being a variable when doing your testing.Dealing
@Matthewmatthews The VM swaps between interpreting code and compiling. Not only is there a significant difference between interpreted and compiled performance, but the compilation takes time too.Gradin
The other thing is that with such small times the clock may not have sufficient level of precision. System.nanoTime should be better.Gradin
You should really use JMH for microbenchmarking. Otherwise, to reduce "noise", rerun after: disabling the JIT ( -Djava.compiler=NONE ) and disabling the GC ( -XX:+UseEpsilonGC )Midkiff
In the case of Windows, the ticker (which the basic clock uses) runs at 64 hz or 15.625 ms per tick, which would explain the results you are getting. You need to use a higher frequency clock and/or a much larger array. On my system (Win 7 Pro 64 bit, Intel 3770K cpu, 3.5 ghz), a 16 million array of 64 bit integers takes 1.5 seconds to merge sort.Teodoor
M
1

A couple of considerations from the comments:

  • In Windows System.currentTimeMillis() gets you a clock with a 64 hz (or 15.625 ms) precision, so the minimum difference will be 15 ms.
  • Given your random initialization of arrays, they will be partialy sorted, and sorting a partially sorted array will be (slightly) faster than an unsorted one

When you add up replicability issues, it is clear that to have sensible results that show the O(n log n) complexity, you should:

  • Use System.nanoTime to get a more precise clock measurement.
  • Use way larger arrays to widen the time differences
  • Use code profilers / benchmarkers like JMH
Mc answered 26/2, 2022 at 17:53 Comment(1)
Performing all the measurements without reporting and print all results afterwards may produce more consistent results as the output tends to occur asynchronously and cause cache pollution.Cathcart

© 2022 - 2024 — McMap. All rights reserved.