What is the complexity of this simple piece of code?
Asked Answered
W

10

34

I'm pasting this text from an ebook I have. It says the complexity if O(n2) and also gives an explanation for it, but I fail to see how.

Question: What is the running time of this code?

public String makeSentence(String[] words) {
    StringBuffer sentence = new StringBuffer();
    for (String w : words) sentence.append(w);
    return sentence.toString();
}

The answer the book gave:

O(n2), where n is the number of letters in sentence. Here’s why: each time you append a string to sentence, you create a copy of sentence and run through all the letters in sentence to copy them over If you have to iterate through up to n characters each time in the loop, and you’re looping at least n times, that gives you an O(n2) run time. Ouch!

Can someone please explain this answer more clearly?

Wheeze answered 23/8, 2011 at 4:1 Comment(3)
@Kublai Khan: It's not his/her answer. It's the answer in the book that is being read, which is what is being called into question.Tantivy
StringBuffer has an internal buffer which gets doubled as soon as it overflows. That really means that copying of "sentence" does not happen each time when you append something to it. It only will happen if the buffer overflows.Serendipity
If you are concerned about performance, use StringBuilder "As of release JDK 5, this class has been supplemented with an equivalent class designed for use by a single thread, StringBuilder. The StringBuilder class should generally be used in preference to this one, as it supports all of the same operations but it is faster, as it performs no synchronization. "Coridon
S
26

This seems to be a question of mislead, because I happened to read that book just now. This part of text in the book is a typo! Here is the context:

===================================================================

Question: What is the running time of this code?

1 public String makeSentence(String[] words) {
2 StringBuffer sentence = new StringBuffer();
3 for (String w : words) sentence.append(w);
4 return sentence.toString();
5 }

Answer: O(n2), where n is the number of letters in sentence. Here’s why: each time you append a string to sentence, you create a copy of sentence and run through all the letters in sentence to copy them over. If you have to iterate through up to n characters each time in the loop, and you’re looping at least n times, that gives you an O(n2) run time. Ouch! With StringBuffer (or StringBuilder) can help you avoid this problem.

1 public String makeSentence(String[] words) {
2 StringBuffer sentence = new StringBuffer();
3 for (String w : words) sentence.append(w);
4 return sentence.toString();
5 }

=====================================================================

Have you noticed that the author messed it up? The O(n2) solution she mentioned (the first one) was exactly the same as the 'optimized' one (the latter). So, my conclusion is that the author was trying to render something else, such as always copying the old sentence to a new buffer when appending every next string, as the example of an O(n2) algorithm. StringBuffer should not be so silly, as the author also mentioned 'With StringBuffer (or StringBuilder) can help you avoid this problem'.

Shumaker answered 30/4, 2012 at 13:58 Comment(2)
Right, the mistake has been corrected in the next book's editionCaledonian
What is the complexity? Your answer did not say that.Straka
S
20

It's a bit difficult to answer a question about the complexity of this code when it is written at a high level which abstracts away the details of the implementation. The Java documentation doesn't seem to give any guarantees in terms of the complexity of the append function. As others have pointed out, the StringBuffer class can (and should) be written so that the complexity of appending strings does not depend on the current length of the string held in StringBuffer.

However, I suspect it is not that helpful to the person asking this question to simply say "your book is wrong!" - instead, let us see what assumptions are being made and make clear what the author was trying to say.

You can make the following assumptions:

  1. Creating a new StringBuffer is O(1)
  2. Getting the next string w in words is O(1)
  3. Returning sentence.toString is at most O(n).

The question is really what is the order of sentence.append(w), and that depends on how it happens inside the StringBuffer. The naive way is to do it like Shlemiel the Painter.

The silly way

Suppose you use a C-style null-terminated string for the contents of StringBuffer. The way you find the end of such a string is by reading each character, one by one, until you find the null character - then to append a new string S, you can start copying characters from S to the StringBuffer string (finishing with another null character). If you write append this way, it is O(a + b), where a is the number of characters currently in the StringBuffer, and b is the number of characters in the new word. If you loop over an array of words, and each time you have to read all the characters you just appended before appending the new word, then the complexity of the loop is O(n^2), where n is the total number of characters in all the words (also the number of characters in the final sentence).

A better way

On the other hand, suppose that the contents of StringBuffer is still an array of characters, but we also store an integer size which tells us how long the string is (number of characters). Now we no longer have to read every character in the StringBuffer in order to find the end of the string; we can just look up index size in the array, which is O(1) instead of O(a). Then the append function now only depends on the number of characters being appended, O(b). In this case the complexity of the loop is O(n), where n is the total number of characters in all the words.

...We're not done yet!

Finally, there's one more aspect of the implementation that hasn't been covered yet, and that is the one actually brought up by the answer in the textbook - memory allocation. Each time you want to write more characters to your StringBuffer, you're not guaranteed to have enough space in your character array to actually fit the new word in. If there isn't enough space, your computer needs to first allocate some more room in a clean section of memory, and then copy all the information in the old StringBuffer array across, and then it can continue as before. Copying data like this will take O(a) time (where a is the number of characters to be copied).

In the worst case, you have to allocate more memory every time you add a new word. This basically takes us back to square one where the loop has O(n^2) complexity, and is what the book seems to suggest. If you assume that nothing crazy is happening (the words aren't getting longer at an exponential rate!), then you can probably reduce the number of memory allocations to something more like O(log(n)) by having the allocated memory grow exponentially. If that's the number of memory allocations, and memory allocations in general are O(a), then the total complexity attributed just to memory management in the loop is O(n log(n)). Since the appending work is O(n) and less than the complexity of the memory management, the total complexity of the function is O(n log(n)).

Again, the Java documentation doesn't help us in terms of how the capacity of the StringBuffer grows, it just says "If the internal buffer overflows, it is automatically made larger". Depending on how it happens, you could end up with either O(n^2) or O(n log(n)) overall.

As an exercise left to the reader: Find an easy way to modify the function so that the overall complexity is O(n), by removing memory reallocation issues.

Shevat answered 23/8, 2011 at 5:37 Comment(2)
The complexity is O(n) even with the reallocations. Multiplying log(n) by n is misleading because not every reallocation has the same cost. The final reallocation has about the same cost as all the others put together.Omaromara
I think of the memory performance as: O(log(n) * log(n)) though I'm not sure if that's a helpful way of thinking of amortization. It does remind me it could be close to O(n) and bears more looking into. In this case, you're copying an avg of roughly log(n) characters roughly log(n) times.Carcinogen
A
19

The accepted answer is just wrong. StringBuffer has amortized O(1) append, so n appends will be O(n).

If it wasn't O(1) append, StringBuffer would have no reason to exist, since writing that loop with plain String concatenation would be O(n^2) as well!

Andriaandriana answered 23/8, 2011 at 4:8 Comment(4)
Note that javac actually optimizes string concatenation to use StringBuilders.Carnay
Cna you cite the source for the complexity?Straka
Ahh StringBuffer underneath basically copies everything to a single byte array, which "doubles" in size when its size is expanded, so hence the amortized O(1). Gotcha.Domenic
You are wrong. With the complexity would be n^3. Since, the complexity would be n+2n+...n^2 which is proportional to n^3.Santanasantayana
C
13

I tried to check it using this program

public class Test {

    private static String[] create(int n) {
        String[] res = new String[n];
        for (int i = 0; i < n; i++) {
            res[i] = "abcdefghijklmnopqrst";
        }
        return res;
    }
    private static String makeSentence(String[] words) {
        StringBuffer sentence = new StringBuffer();
        for (String w : words) sentence.append(w);
        return sentence.toString();
    }


    public static void main(String[] args) {
        String[] ar = create(Integer.parseInt(args[0]));
        long begin = System.currentTimeMillis();
        String res = makeSentence(ar);
        System.out.println(System.currentTimeMillis() - begin);
    }
}

And result was, as expected, O(n):

java Test 200000 - 128 ms

java Test 500000 - 370 ms

java Test 1000000 - 698 ms

Version 1.6.0.21

Copulation answered 23/8, 2011 at 4:15 Comment(0)
J
12

I think these text in the book must be a typo ,I think the right content is below,I fix it:

===================================================================

Question: What is the running time of this code?

public String makeSentence(String[] words) {
    String sentence = new String("");
    for (String w : words) sentence+=W;
    return sentence;
}

Answer: O(n2), where n is the number of letters in sentence. Here’s why: each time you append a string to sentence, you create a copy of sentence and run through all the letters in sentence to copy them over. If you have to iterate through up to n characters each time in the loop, and you’re looping at least n times, that gives you an O(n2) run time. Ouch! With StringBuffer (or StringBuilder) can help you avoid this problem.

public String makeSentence(String[] words) {
    StringBuffer sentence = new StringBuffer();
    for (String w : words) sentence.append(w);
    return sentence.toString();
}

=====================================================================

Am i right?

Jameljamerson answered 21/11, 2013 at 14:42 Comment(2)
If you think that the content of the original question is wrong in any way edit itBookout
I believe the book meant to say this as well.Heid
G
3

There's a typo in this book.


1st case :

public String makeSentence(String[] words) {
    String sentence = new String();
    for (String w : words) sentence += w;
    return sentence;
}

Complexity : O(n^2) -> (n words) x (n characters copied at each iteration, for copying the current sentence into a StringBuffer)


2nd case :

public String makeSentence(String[] words) {
    StringBuffer sentence = new StringBuffer();
    for (String w : words) sentence.append(w);
    return sentence.toString();
}

Complexity : O(n) -> (n words) x O(1) (amortized complexity for StringBuffer concatenation)

Gd answered 25/9, 2015 at 12:19 Comment(0)
D
2

That really depends on the implementation of StringBuffer. Supposing .append() was constant time, it's clear that you have an O(n) algorithm in time where n = length of the words array. If .append isn't constant time, you'll need to multiple your O(n) by the time complexity of the method. If indeed the current implementation of StringBuffer copies strings character-by-character, then the algorithm above is

Θ(n*m), or O(n*m), where n is the number of words and m is average word length, and your book is wrong. I assume you're looking for a strict bound.

Simple example that the book's answer is incorrect: String[] words = ['alphabet'] By the book's definition, n=8, so the algorithm will be bounded by 64 steps. Is this the case? Clearly not strictly. I see 1 assignment and 1 copy operation with n characters, so you get about 9 steps. This sort of behavior is predicted by the bounds of O(n*m), as I illustrated above.

I did some digging, and this clearly isn't a simple character copy. It looks like memory is being copied in bulk, which puts us back at O(n), your first guess at the solution.

/* StringBuffer is just a proxy */
public AbstractStringBuilder append(String str) 
{
        if (str == null) str = "null";
        int len = str.length();
        ensureCapacityInternal(count + len);
        str.getChars(0, len, value, count);
        count += len;
        return this;
}

/* java.lang.String */
void getChars(char dst[], int dstBegin) {
             System.arraycopy(value, offset, dst, dstBegin, count);
}

Your book is either old, terrible, or both. I'm not determined enough to dig through JDK versions to find a less optimal implementation of StringBuffer, but perhaps one exists.

Duodenary answered 23/8, 2011 at 4:8 Comment(6)
The Sun/Oracle JVM (Hotspot) implements the append operation using System.arraycopy which in turn relies on a O(n) operation (implemented in objArrayKlass.cpp of the OpenJDK distribution) to perform the actual copying of array object members in memory. So, O(M*n) is correct. I doubt any other JVM does copying in constant time, given that occupied memory blocks have to be realigned.Kamacite
Is System.arraycopy necessarily serial on all JVMs? I could imagine that somehow being optimized for super-concurrent hardware that made this nearly linear for most sizes of a given string.Duodenary
@Vineet: Ah. Well, how about an implementation which increases the length of the buffer while deferring the actual copy of memory? I guess the real answer here is that the implementation of StringBuffer is entirely the solution to this question, and that can and likely will vary over time and across JVM implementations.Duodenary
the problem with resizing/copying the array, is that the actual implementation varies from one architecture to another (assuming only char arrays here, for the sake of brevity). On Windows and SPARC, it appears to be O(n) as there is atleast one loop that iterates through the number of the characters being copied. On the Linux and the Solaris x86 platforms, the JVM delegates to the extern C calls (of whose characteristics I'm not aware; I'm having quite a struggle with assembly language).Kamacite
@Vineet - It's safe to assume that copying data from one buffer to another is O(n). At least until someone demonstrates an algorithm that can do faster. I think at best some architectures might include special instructions that allow the copying of some fixed multiple of words as part of a single operation. But even with such an instruction, the copy operation still reduces to O(n).Ignorant
@aroth, yes, that's my point. Given all other checks that are implemented in the JVM, array copying tends to be O(n). Constant-time copies appears to be near impossible, as the best possibilities are memcpy/memmove which are again O(n), although OpenJDK appears to use these for primitives and not for object arrays.Kamacite
H
1

As the explanation given in the book, for ever word in the string array a new object of sentence gets created and that sentence object first copies the previous sentence and then traverses till the end of the array and then appends the new word, hence the complexity of n^2.

  1. First 'n' to copy the previous sentence into a new object
  2. Second 'n' to traverse that array and then append it

Hence n*n will be n^2.

How answered 23/3, 2012 at 22:58 Comment(0)
I
0

Looks like O(n) to me (with n being the total number of letters in all the words). You're basically iterating over every character in words to append it into the StringBuffer.

The only way I could see this as being O(n^2) is if append() iterates all the contents in the buffer before appending any new characters. And it may actually do this occasionally if the number of characters exceeds the currently allocated buffer length (it has to allocate a new buffer, and then copy everything from the current buffer into the new buffer). But it won't happen on every iteration, so you still won't have O(n^2).

At most you'd have O(m * n), where m is the number of times the buffer length is increased. And because the StringBuffer will double its buffer size every time it allocates a larger buffer we can determine that m is roughly equal to log2(n) (actually log2(n) - log2(16), since the default initial buffer size is 16 instead of 1).

So the real answer is that the book's example is O(n log n), and that you can get it down to O(n) by preallocating a StringBuffer with a capacity large enough to hold all of your letters.

Note that in Java appending to a string using += does exhibit the inefficient behavior described in the book's explanation, as it has to allocate a new string and copy all the data from both strings into it. So if you do this, it is O(n^2):

String sentence = "";
for (String w : words) {
    sentence += w;
}

But using StringBuffer should not generate the same behavior as in the above example. That's kind of one of the major reasons for StringBuffer to exist in the first place.

Ignorant answered 23/8, 2011 at 4:13 Comment(0)
B
-1

Here's my calculation for how they got O(n^2)

We'll ignore the CPU time for declaring the StringBuffer, as it doesn't vary with the size of the final string.

When calculating the O complexity we are concerned with the worst case, this will occur when there are 1 letter Strings. I shall explain after this example:

Let's say we have 4 one-letter strings: 'A', 'B', 'C', 'D'.

Read in A: CPU-time to find end of StringBuffer: 0 CPU-time to append 'A': 1

Read in B: CPU-time to find end of StringBuffer: 1 CPU-time to append 'B': 1

Read in C: CPU-time to find end of StringBuffer: 2 CPU-time to append 'C': 1

Read in D: CPU-time to find end of StringBuffer: 3 CPU-time to append 'D': 1

CPU-time to copy StringBuffer to String at the end: 4

Total CPU-time = 1 + 2 + 3 + 4 + 4

If we generalise this to n 1-letter words:

1 + 2 + 3 + ...... + n + n = 0.5n(n+1) + n

I did this by using the formula for the sum of an arithmetic sequence.

O(0.5n^2 + 1.5n) = O(n^2).

If we use multi-letter words, we are going to have to find the end of the StringBuffer less frequently, leading to a lower CPU-time and a 'better' case.

Brook answered 20/5, 2015 at 11:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.