Why is the time complexity of this example from "Cracking the Coding Interview" O(k c^k)?
Asked Answered
C

4

11

This question is from Cracking the Coding Interview 6th Edition, Question V1.11.

The following code prints all strings of length k where the characters are in sorted order. It does this by generating all strings of length k and then checking if each is sorted. What is the runtime?

package QVI_11_Print_Sorted_Strings;


public class Question {

    public static int numChars = 26;

    public static void printSortedStrings(int remaining) {
        printSortedStrings(remaining, "");
    }

    public static void printSortedStrings(int remaining, String prefix) {
        if (remaining == 0) {
            if (isInOrder(prefix)) {
                System.out.println(prefix);
            }
        } else {
            for (int i = 0; i < numChars; i++) {
                char c = ithLetter(i);
                printSortedStrings(remaining - 1, prefix + c);
            }
        }
    }

    public static boolean isInOrder(String s) {
        for (int i = 1; i < s.length(); i++) {
            int prev = ithLetter(s.charAt(i - 1));
            int curr = ithLetter(s.charAt(i));
            if (prev > curr) {
                return false;
            }
        }
        return true;
    }

    public static char ithLetter(int i) {
        return (char) (((int) 'a') + i);
    }

    public static void main(String[] args) {
        printSortedStrings(5);
    }

}

The answer is O(k c^k) where k is the length of the string and c is the number of characters in the alphabet. It takes O(c^k) time to generate each string. Then, we need to check that each of these is sorted, which takes O(k) time.

Now, I understand where O(k) comes from but I don't see how O(c^k) came about.

Conversion answered 23/5, 2017 at 22:20 Comment(0)
D
13

The above algorithm works by recursively generating all possible strings of length k using a set of c choices of characters. The number of possible strings of length k you can make from c choices of letters is equal to ck. For example, if I have two letters to pick from (a and b) and I have strings of length three, there are 23 = 8 possible strings I can make:

  • aaa
  • aab
  • aba
  • abb
  • baa
  • bab
  • bba
  • bbb

To better see where this comes from, notice that every time you add a new letter to the end of the string, you have c choices for what that letter can be, so the number of possible strings is

c · c · ... · c (k times) =

ck

This means that the above code, which works by generating each of these strings, must do at least Ω(ck) work, since that's the minimum number of strings to check.

So how much work does it do for each of these strings? Here's where things get tricky. These strings are built up one character at a time by constantly appending a new character from the list of possible characters. In Java, appending to a string makes a full copy of the string, so the cost of appending the first character is (roughly) 1, the second is (roughly) 2, then 3, then 4, etc. This means that the cost of building up a full string of length k will be

1 + 2 + 3 + ... + k

= Θ(k2)

So it actually seems like the runtime here would be O(ck k2) rather than O(k ck), since the cost of building all those strings adds up rather quickly.

However, that's not a tight bound. For example, some of the work done to form the string aaa is also used to form the string aab, since both strings are formed by starting with aa and concatenating another character in.

To get a better analysis, we can sum up the total work done performing concatenations at each level in the tree. The zeroth level of the tree has a single string of size 0, so no concatenations are done. The first level of the tree has c strings of size 1, requiring c work to do to the concatenations. The second level of the tree has c2 strings of size 2, requiring 2c2 work to form. The third level of the three has c3 strings of size 3, requiring 3c3 work to form. More generally, level i requires ici work to form. This means we want to determine

0c0 + 1c1 + 2c2 + ... + kck.

This summation works out to Θ(kck), with a lower exponent of the k term.

To summarize:

  • The ck term comes from the number of strings that needed to be checked.
  • The k term comes from from the work done per string.
  • A careful analysis shows that the time to generate all these strings does not affect the overall runtime of O(k ck).
Drizzle answered 27/7, 2017 at 18:57 Comment(2)
The strings share common prefixes so share some of the work done. Does that change the complexity?Late
Interesting point! I believe it might, but given this particular implementation which works by doing a lot of string concatenation in Java it might not apply here. Revisiting this idea without that constraint might lead to some improvements.Drizzle
L
6

The recursive calls to printSortedStrings form a recursion tree. Because the total number of nodes is about a constant factor of the number of nodes at the lowest level, and the upper levels don't do more work than the lowest level, then only the cost of the lowest level is significant.

Take for example, c being 2, and k being 3:

Looking at the first level of the tree this produces:

2 (or 2^1) strings, "a", "b".

The second level produces:

4 (or 2^2) strings, "aa", "ab", "ba", "bb".

The third level produces:

8 (or 2^3) strings, "aaa", "aab", "aba", "abb", "baa", "bab", "bba", "bbb".

The cost to produce the next string in sequence is linear. The characters from the old string plus the new character are copied into the new string.

This depends on the length of the string, so for the first level the cost is 1, the second level cost is 2, and the third level cost is 3. Multiply by the number of items at each level:

(2^1)*1 + (2^2)*2 + (2^3)*3 = 34

This pattern would continue if k was 4, then it would be:

(2^1)*1 + (2^2)*2 + (2^3)*3 + (2^4)*4 = 98

This thing about sums like this is that the last term is greater than all the previous ones combined. So only the last term is significant:

(2^1)*1 + (2^2)*2 + (2^3)*3 < (2^4)*4

Because:

(2^1)*1 + (2^2)*2 + (2^3)*3
< (2^1)*3 + (2^2)*3 + (2^3)*3
= (2^1 + 2^2 + 2^3) * 3
= (2^4 - 2) * 3
< (2^4 - 2) * 4
< (2^4) * 4

So the whole sum is less than 2*(2^4)*4, or 2*(c^k)*k, or O(k c^k).

At the end of the recursion, there is more linear time work. There are c^k nodes with k work giving another O(k c^k) cost. So the total cost is still O(k c^k).

One other thing is that the for-loop takes linear time per string as well, but this works out to a total of O(k c^k), for similar reasons as above.

Late answered 23/9, 2017 at 16:23 Comment(1)
Excellent analysis. I'm going to revise my answer.Drizzle
A
0

I don’t think they accounted for the k^2 in the book because the code isn’t written in Java in the actual book. So it is safe to assume that string concatenation is O(1). Also, the answer explanation in the book is not the best. It only takes O(k) time to generate each string but there are O(c^k) possible strings. And it takes O(k) to verify the result so it should be O(k^2 c^k) not O(kc^k) (with an O(1) string concatenation time). However, that you should discuss with your interviewer, just ask if prints, concatenation, etc. take O(1) time or longer.

Albi answered 23/9, 2017 at 13:37 Comment(1)
The cost of generating each string doesn’t get multiplied with the cost of processing each string, but rather added in.Drizzle
S
0

I would like to add something on top of @templatetypedef's answer. The first part of time taken to generate all things is already explained as

c^k

. For the second part, the author doesn't consider the time taken for concatenation in java and assumes it as constant time. As per the author's explaination, after each string is generated, the isInOrder() function is called for each string which takes k amount of time i.e. length of the string. As there are c^k strings, the total no of comparisons will be k + k + k .... c^k times i.e. k * c^k times. Now,

total time taken = time taken to generate all strings + total comparisions

i.e. c^k + (k * c^k) Since we ignore the smaller term, the time complexity will be simply kc^k

Suburbanite answered 7/10, 2023 at 11:17 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.