I am working my way through the book Introduction to Algorithms, 3rd edition. One of the first things explained is the insertion sort. On page 18 there is some pseudo code:
A = { 5, 2, 4, 6, 1, 3 };
INSERTION-SORT(A)
1 for j = 2 to A.length
2 key = A[j]
4 i = j - 1
5 while (i > 0 and A[i] > key)
6 A[i + 1] = A[i]
7 i = i - 1
8 A[i + 1] = key
It says that pseudo code is used so that it is easily translated to any kind of language (C, C++, Java, they don't mention, but I guess C# too). Since I program in C#, I translated it in LinqPad.
int[] a = { 5, 2, 4, 6, 1, 3 };
for (var j = 1; j < a.Length; j++)
{
var key = a[j];
var i = j - 1;
while(i > 0 && a[i] > key)
{
a[i + 1] = a[i];
i--;
}
a[i + 1] = key;
}
a.Dump();
You're probably going to ask, why does j start at 1, when it clearly says 2? In the book, the array has an index starting at 1. And yes, I now I probably should have updated all the [i - 1]
and [i + i]
as well.
Anyways, after I was done, I run the code and notice that it doesn't actually sort correctly. The output is { 5, 1, 2, 3, 4, 6 }
. It was late and should have stopped, but I struggled on to make the code correct. I did everything, even taking the pseudo code as is from the book (starting at 2). Still not the correct output.
I contacted one of the professors of the book, and he send me the code for the insertion sort, in C:
void insertion_sort(int *A, int n) {
for (int j = 2; j <= n; j++) {
int key = A[j];
int i = j-1;
while (i > 0 && A[i] > key) {
A[i+1] = A[i];
i--;
}
A[i+1] = key;
}
}
Translated in C#:
int[] a = { 5, 2, 4, 6, 1, 3 };
for (var j = 2; j <= a.Length; j++)
{
var key = a[j];
var i = j - 1;
while(i > 0 && a[i] > key)
{
a[i + 1] = a[i];
i--;
}
a[i + 1] = key;
}
I get an array out of bounds. Okay then maybe:
int[] a = { 5, 2, 4, 6, 1, 3 };
for (var j = 2; j <= a.Length - 1; j++)
{
var key = a[j];
var i = j - 1;
while(i > 0 && a[i] > key)
{
a[i + 1] = a[i];
i--;
}
a[i + 1] = key;
}
Output: { 5, 1, 2, 3, 4, 6 }
I'm thinking, this can't be correct. The pseudo code says 2 to array.Length. Is that 2 < array.Length, or 2 <= array.Length? What is going on here?
I personally think it is because of the 0 > 0
predicate in the while loop. It actually falls short one time each time.
My explanation (from my email sent to the professor, to lazy to type it all over):
The reason why the loop still ends up with { 5, 1, 2, 3, 4, 6 }
is because of the i > 0
predicate. Every time in the while loop you subtract 1 of i (i--
). This will eventually lead to 0 > 0
which ends up false (only 0 == 0
will return true), but this is when the loop still needs to run one more time. It continuously falls one short. It should go do the while loop 1 more time to properly sort.
Another explanation:
When j starts with 2, key == 4, i == 1 and a[i] == 2. The while loop won't run in this case because 2 > 0 but 2 isn't greater than 4.
j == 3,
key == 6,
i == 2,
a[i] == 4
While loop won't run because 4 is not greater than 6
j == 4,
key == 1,
i == 3,
a[i] == 6
While loop runs this time:
a[i + 1] = a[i] -> a[4] = a[3] -> { 5, 2, 4, 6, 6, 3 }
i-- -> i == 2
Again while loop because 2 > 0 and 4 > 1
a[i + 1] = a[i] -> a[3] = a[2] -> { 5, 2, 4, 4, 6, 3 }
i-- -> i == 1
Again while loop because 1 > 0 and 2 > 1
a[i + 1] = a[i] -> a[2] = a[1] -> { 5, 2, 2, 4, 6, 3 }
i-- -> i == 0
And here is where it goes (in my opinion) wrong. i is now equal to zero, but the while loop should run one more time to get the 5 out of the zero-th position.
The professor assures me that he is correct, but I can't get the right output. Where is my thinking going wrong?
The array in the C code that got sent to me by the professor was actually starting with an index of 1. I did not know this and checking upon C arrays I saw that they all start with 0. Yes, then the C code doesn't produce the correct output. The professor explained this to me and the pieces now fall into its place.