This problem is a well-known / "classic" optimization issue for JavaScript, caused by the fact that JavaScript strings are "immutable" and addition by concatenation of even a single character to a string requires creation of, including memory allocation for and copying to, an entire new string.
Unfortunately, the accepted answer on this page is wrong, where "wrong" means by a performance factor of 3x for simple one-character strings, and 8x-97x for short strings repeated more times, to 300x for repeating sentences, and infinitely wrong when taking the limit of the ratios of complexity of the algorithms as n
goes to infinity. Also, there is another answer on this page which is almost right (based on one of the many generations and variations of the correct solution circulating throughout the Internet in the past 13 years). However, this "almost right" solution misses a key point of the correct algorithm causing a 50% performance degradation.
JS Performance Results for the accepted answer, the top-performing other answer (based on a degraded version of the original algorithm in this answer), and this answer using my algorithm created 13 years ago
~ October 2000 I published an algorithm for this exact problem which was widely adapted, modified, then eventually poorly understood and forgotten. To remedy this issue, in August, 2008 I published an article http://www.webreference.com/programming/javascript/jkm3/3.html explaining the algorithm and using it as an example of simple of general-purpose JavaScript optimizations. By now, Web Reference has scrubbed my contact information and even my name from this article. And once again, the algorithm has been widely adapted, modified, then poorly understood and largely forgotten.
Original string repetition/multiplication JavaScript algorithm by
Joseph Myers, circa Y2K as a text multiplying function within Text.js;
published August, 2008 in this form by Web Reference:
http://www.webreference.com/programming/javascript/jkm3/3.html (The
article used the function as an example of JavaScript optimizations,
which is the only for the strange name "stringFill3.")
/*
* Usage: stringFill3("abc", 2) == "abcabc"
*/
function stringFill3(x, n) {
var s = '';
for (;;) {
if (n & 1) s += x;
n >>= 1;
if (n) x += x;
else break;
}
return s;
}
Within two months after publication of that article, this same question was posted to Stack Overflow and flew under my radar until now, when apparently the original algorithm for this problem has once again been forgotten. The best solution available on this Stack Overflow page is a modified version of my solution, possibly separated by several generations. Unfortunately, the modifications ruined the solution's optimality. In fact, by changing the structure of the loop from my original, the modified solution performs a completely unneeded extra step of exponential duplicating (thus joining the largest string used in the proper answer with itself an extra time and then discarding it).
Below ensues a discussion of some JavaScript optimizations related to all of the answers to this problem and for the benefit of all.
Technique: Avoid references to objects or object properties
To illustrate how this technique works, we use a real-life JavaScript function which creates strings of whatever length is needed. And as we'll see, more optimizations can be added!
A function like the one used here is to create padding to align columns of text, for formatting money, or for filling block data up to the boundary. A text generation function also allows variable length input for testing any other function that operates on text. This function is one of the important components of the JavaScript text processing module.
As we proceed, we will be covering two more of the most important optimization techniques while developing the original code into an optimized algorithm for creating strings. The final result is an industrial-strength, high-performance function that I've used everywhere--aligning item prices and totals in JavaScript order forms, data formatting and email / text message formatting and many other uses.
Original code for creating strings stringFill1()
function stringFill1(x, n) {
var s = '';
while (s.length < n) s += x;
return s;
}
/* Example of output: stringFill1('x', 3) == 'xxx' */
The syntax is here is clear. As you can see, we've used local function variables already, before going on to more optimizations.
Be aware that there's one innocent reference to an object property s.length
in the code that hurts its performance. Even worse, the use of this object property reduces the simplicity of the program by making the assumption that the reader knows about the properties of JavaScript string objects.
The use of this object property destroys the generality of the computer program. The program assumes that x
must be a string of length one. This limits the application of the stringFill1()
function to anything except repetition of single characters. Even single characters cannot be used if they contain multiple bytes like the HTML entity
.
The worst problem caused by this unnecessary use of an object property is that the function creates an infinite loop if tested on an empty input string x
. To check generality, apply a program to the smallest possible amount of input. A program which crashes when asked to exceed the amount of available memory has an excuse. A program like this one which crashes when asked to produce nothing is unacceptable. Sometimes pretty code is poisonous code.
Simplicity may be an ambiguous goal of computer programming, but generally it's not. When a program lacks any reasonable level of generality, it's not valid to say, "The program is good enough as far as it goes." As you can see, using the string.length
property prevents this program from working in a general setting, and in fact, the incorrect program is ready to cause a browser or system crash.
Is there a way to improve the performance of this JavaScript as well as take care of these two serious problems?
Of course. Just use integers.
Optimized code for creating strings stringFill2()
function stringFill2(x, n) {
var s = '';
while (n-- > 0) s += x;
return s;
}
Timing code to compare stringFill1()
and stringFill2()
function testFill(functionToBeTested, outputSize) {
var i = 0, t0 = new Date();
do {
functionToBeTested('x', outputSize);
t = new Date() - t0;
i++;
} while (t < 2000);
return t/i/1000;
}
seconds1 = testFill(stringFill1, 100);
seconds2 = testFill(stringFill2, 100);
The success so far of stringFill2()
stringFill1()
takes 47.297 microseconds (millionths of a second) to fill a 100-byte string, and stringFill2()
takes 27.68 microseconds to do the same thing. That's almost a doubling in performance by avoiding a reference to an object property.
Technique: Avoid adding short strings to long strings
Our previous result looked good--very good, in fact. The improved function stringFill2()
is much faster due to the use of our first two optimizations. Would you believe it if I told you that it can be improved to be many times faster than it is now?
Yes, we can accomplish that goal. Right now we need to explain how we avoid appending short strings to long strings.
The short-term behavior appears to be quite good, in comparison to our original function. Computer scientists like to analyze the "asymptotic behavior" of a function or computer program algorithm, which means to study its long-term behavior by testing it with larger inputs. Sometimes without doing further tests, one never becomes aware of ways that a computer program could be improved. To see what will happen, we're going to create a 200-byte string.
The problem that shows up with stringFill2()
Using our timing function, we find that the time increases to 62.54 microseconds for a 200-byte string, compared to 27.68 for a 100-byte string. It seems like the time should be doubled for doing twice as much work, but instead it's tripled or quadrupled. From programming experience, this result seems strange, because if anything, the function should be slightly faster since work is being done more efficiently (200 bytes per function call rather than 100 bytes per function call). This issue has to do with an insidious property of JavaScript strings: JavaScript strings are "immutable."
Immutable means that you cannot change a string once it's created. By adding on one byte at a time, we're not using up one more byte of effort. We're actually recreating the entire string plus one more byte.
In effect, to add one more byte to a 100-byte string, it takes 101 bytes worth of work. Let's briefly analyze the computational cost for creating a string of N
bytes. The cost of adding the first byte is 1 unit of computational effort. The cost of adding the second byte isn't one unit but 2 units (copying the first byte to a new string object as well as adding the second byte). The third byte requires a cost of 3 units, etc.
C(N) = 1 + 2 + 3 + ... + N = N(N+1)/2 = O(N^2)
. The symbol O(N^2)
is pronounced Big O of N squared, and it means that the computational cost in the long run is proportional to the square of the string length. To create 100 characters takes 10,000 units of work, and to create 200 characters takes 40,000 units of work.
This is why it took more than twice as long to create 200 characters than 100 characters. In fact, it should have taken four times as long. Our programming experience was correct in that the work is being done slightly more efficiently for longer strings, and hence it took only about three times as long. Once the overhead of the function call becomes negligible as to how long of a string we're creating, it will actually take four times as much time to create a string twice as long.
(Historical note: This analysis doesn't necessarily apply to strings in source code, such as html = 'abcd\n' + 'efgh\n' + ... + 'xyz.\n'
, since the JavaScript source code compiler can join the strings together before making them into a JavaScript string object. Just a few years ago, the KJS implementation of JavaScript would freeze or crash when loading long strings of source code joined by plus signs. Since the computational time was O(N^2)
it wasn't difficult to make Web pages which overloaded the Konqueror Web browser or Safari, which used the KJS JavaScript engine core. I first came across this issue when I was developing a markup language and JavaScript markup language parser, and then I discovered what was causing the problem when I wrote my script for JavaScript Includes.)
Clearly this rapid degradation of performance is a huge problem. How can we deal with it, given that we cannot change JavaScript's way of handling strings as immutable objects? The solution is to use an algorithm which recreates the string as few times as possible.
To clarify, our goal is to avoid adding short strings to long strings, since in order to add the short string, the entire long string also must be duplicated.
How the algorithm works to avoid adding short strings to long strings
Here's a good way to reduce the number of times new string objects are created. Concatenate longer lengths of string together so that more than one byte at a time is added to the output.
For instance, to make a string of length N = 9
:
x = 'x';
s = '';
s += x; /* Now s = 'x' */
x += x; /* Now x = 'xx' */
x += x; /* Now x = 'xxxx' */
x += x; /* Now x = 'xxxxxxxx' */
s += x; /* Now s = 'xxxxxxxxx' as desired */
Doing this required creating a string of length 1, creating a string of length 2, creating a string of length 4, creating a string of length 8, and finally, creating a string of length 9. How much cost have we saved?
Old cost C(9) = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 9 = 45
.
New cost C(9) = 1 + 2 + 4 + 8 + 9 = 24
.
Note that we had to add a string of length 1 to a string of length 0, then a string of length 1 to a string of length 1, then a string of length 2 to a string of length 2, then a string of length 4 to a string of length 4, then a string of length 8 to a string of length 1, in order to obtain a string of length 9. What we're doing can be summarized as avoiding adding short strings to long strings, or in other words, trying to concatenate strings together that are of equal or nearly equal length.
For the old computational cost we found a formula N(N+1)/2
. Is there a formula for the new cost? Yes, but it's complicated. The important thing is that it is O(N)
, and so doubling the string length will approximately double the amount of work rather than quadrupling it.
The code that implements this new idea is nearly as complicated as the formula for the computational cost. When you read it, remember that >>= 1
means to shift right by 1 byte. So if n = 10011
is a binary number, then n >>= 1
results in the value n = 1001
.
The other part of the code you might not recognize is the bitwise and operator, written &
. The expression n & 1
evaluates true if the last binary digit of n
is 1, and false if the last binary digit of n
is 0.
New highly-efficient stringFill3()
function
function stringFill3(x, n) {
var s = '';
for (;;) {
if (n & 1) s += x;
n >>= 1;
if (n) x += x;
else break;
}
return s;
}
It looks ugly to the untrained eye, but it's performance is nothing less than lovely.
Let's see just how well this function performs. After seeing the results, it's likely that you'll never forget the difference between an O(N^2)
algorithm and an O(N)
algorithm.
stringFill1()
takes 88.7 microseconds (millionths of a second) to create a 200-byte string, stringFill2()
takes 62.54, and stringFill3()
takes only 4.608. What made this algorithm so much better? All of the functions took advantage of using local function variables, but taking advantage of the second and third optimization techniques added a twenty-fold improvement to performance of stringFill3()
.
Deeper analysis
What makes this particular function blow the competition out of the water?
As I've mentioned, the reason that both of these functions, stringFill1()
and stringFill2()
, run so slowly is that JavaScript strings are immutable. Memory cannot be reallocated to allow one more byte at a time to be appended to the string data stored by JavaScript. Every time one more byte is added to the end of the string, the entire string is regenerated from beginning to end.
Thus, in order to improve the script's performance, one must precompute longer length strings by concatenating two strings together ahead of time, and then recursively building up the desired string length.
For instance, to create a 16-letter byte string, first a two byte string would be precomputed. Then the two byte string would be reused to precompute a four-byte string. Then the four-byte string would be reused to precompute an eight byte string. Finally, two eight-byte strings would be reused to create the desired new string of 16 bytes. Altogether four new strings had to be created, one of length 2, one of length 4, one of length 8 and one of length 16. The total cost is 2 + 4 + 8 + 16 = 30.
In the long run this efficiency can be computed by adding in reverse order and using a geometric series starting with a first term a1 = N and having a common ratio of r = 1/2. The sum of a geometric series is given by a_1 / (1-r) = 2N
.
This is more efficient than adding one character to create a new string of length 2, creating a new string of length 3, 4, 5, and so on, until 16. The previous algorithm used that process of adding a single byte at a time, and the total cost of it would be n (n + 1) / 2 = 16 (17) / 2 = 8 (17) = 136
.
Obviously, 136 is a much greater number than 30, and so the previous algorithm takes much, much more time to build up a string.
To compare the two methods you can see how much faster the recursive algorithm (also called "divide and conquer") is on a string of length 123,457. On my FreeBSD computer this algorithm, implemented in the stringFill3()
function, creates the string in 0.001058 seconds, while the original stringFill1()
function creates the string in 0.0808 seconds. The new function is 76 times faster.
The difference in performance grows as the length of the string becomes larger. In the limit as larger and larger strings are created, the original function behaves roughly like C1
(constant) times N^2
, and the new function behaves like C2
(constant) times N
.
From our experiment we can determine the value of C1
to be C1 = 0.0808 / (123457)2 = .00000000000530126997
, and the value of C2
to be C2 = 0.001058 / 123457 = .00000000856978543136
. In 10 seconds, the new function could create a string containing 1,166,890,359 characters. In order to create this same string, the old function would need 7,218,384 seconds of time.
This is almost three months compared to ten seconds!
I'm only answering (several years late) because my original solution to this problem has been floating around the Internet for more than 10 years, and apparently is still poorly-understood by the few who do remember it. I thought that by writing an article about it here I would help:
Performance Optimizations for High Speed JavaScript / Page 3
Unfortunately, some of the other solutions presented here are still some of those that would take three months to produce the same amount of output that a proper solution creates in 10 seconds.
I want to take the time to reproduce part of the article here as a canonical answer on Stack Overflow.
Note that the best-performing algorithm here is clearly based on my algorithm and was probably inherited from someone else's 3rd or 4th generation adaptation. Unfortunately, the modifications resulted in reducing its performance. The variation of my solution presented here perhaps did not understand my confusing for (;;)
expression which looks like the main infinite loop of a server written in C, and which was simply designed to allow a carefully-positioned break statement for loop control, the most compact way to avoid exponentially replicating the string one extra unnecessary time.
'mystring'.repeat(xTimes)
– Paranoiac