Fastest strategy to form and sort an array of positive integers
Asked Answered
S

1

5

In Java, what is faster: to create, fill in and then sort an array of ints like below

int[] a = int[1000];
for (int i = 0; i < a.length; i++) {
    // not sure about the syntax
    a[i] = Maths.rand(1, 500) // generate some random positive number less than 500
}
a.sort(); // (which algorithm is best?)

or insert-sort on the fly

int[] a = int[1000];
for (int i = 0; i < a.length; i++) {
    // not sure about the syntax
    int v = Maths.rand(1, 500) // generate some random positive number less than 500
    int p = findPosition(a, v); // where to insert
    if (a[p] == 0) a[p] = v;
    else {
        shift a by 1 to the right
        a[p] = v;
    }
}
Schoolbag answered 21/8, 2012 at 22:15 Comment(4)
why not do some timings and let us know?Desilva
Hint: Analyze the complexities.Liederkranz
Have you at least tried it out?Erda
if you are creating an array with a specific distribution (say only positive and small range) you could use counting, bucket sort. Otherwise you will still have to compare which leaves you with nlognSantosantonica
A
10

There are many ways that you could do this:

  1. Build the array and sort as you go. This is likely to be very slow, since the time required to move array elements over to make space for the new element will almost certainly dominate the sorting time. You should expect this to take at best Ω(n2) time, where n is the number of elements that you want to put into the array, regardless of the algorithm used. Doing insertion sort on-the-fly will take expected O(n2) time here.

  2. Build the array unsorted, then sort it. This is probably going to be extremely fast if you use a good sorting algorithm, such as quicksort or radix sort. You should expect this to take O(n log n) time (for quicksort) or O(n lg U) time (for radix sort), where n is the number of values and U is the largest value.

  3. Add the numbers incrementally to a priority queue, then dequeue all elements from the priority queue. Depending on how you implement the priority queue, this could be very fast. For example, using a binary heap here would cause this process to take O(n log n) time, and using a van Emde Boas tree would take O(n lg lg U) time, where U is the largest number that you are storing. That said, the constant factors here would likely make this approach slower than just sorting the values.

Hope this helps!

Arnaldo answered 21/8, 2012 at 22:18 Comment(6)
Thank you for this interesting answer. The third option is nice for studying, but looks like possible overhead :)Schoolbag
Just curious, is the log in O(n log n) different from the lg in O(n lg U)?Fiddlewood
@BrianJ- No, it's not. Logarithms in big-O notation don't need to have the base specified, since they're all just constant multiples of one another. I used lg (binary logarithm) in the second case to make it clearer that the lg is derived from the number of bits in U, but for clarity I probably should have been more consistent.Arnaldo
@Arnaldo Is that really true? It's like saying the exponents in O(n^e) don't need to be specified. I'd prefer an O(log10(n)) algorithm over an O(log2(n)) algorithm. Wouldn't you?Weekly
@Weekly There's a difference between saying that the bases of logs don't matter in big-O notation and saying that the bases of exponents don't matter or that the exponents themselves don't. In the second two cases, if absolutely is significant. The reason that it works for logarithms is that log_b a = log_c a / log_c b for any bases b and c, so all logarithms are just a constant factor off of one another and big-O notation ignores constants. For that reason, I wouldn't know whether a O(log_10 n) or an O(log_2 n) algorithm would be better for the same reason that...Arnaldo
@Weekly ... I wouldn't know whether an O(n) or an O(10n) algorithm is better: any algorithm whose runtime is O(n) is also O(10n) and vice-versa, so the constant doesn't actually say anything.Arnaldo

© 2022 - 2024 — McMap. All rights reserved.