Why is ''.join() faster than += in Python?

Asked 3/9, 2016 at 23:11 Answered 4/9, 2016 at 6:44

I'm able to find a bevy of information online (on Stack Overflow and otherwise) about how it's a very inefficient and bad practice to use + or += for concatenation in Python.

I can't seem to find WHY += is so inefficient. Outside of a mention here that "it's been optimized for 20% improvement in certain cases" (still not clear what those cases are), I can't find any additional information.

What is happening on a more technical level that makes ''.join() superior to other Python concatenation methods?

Aubert answered 3/9, 2016 at 23:11 Comment(11)

Related. #34008510 – Mccarron 3/9, 2016 at 23:14

+= behaves differently for string, integers. it is possible that Python takes more time in figuring the type of data on which += to operate on i.e. its addition if they are integers while concat if they are strings. While in ' '.join() operation, it expects string elements only - which makes Python to not worry about type of data its dealing with. – Analyzer 3/9, 2016 at 23:19

@cricket_007 linked to a great post providing more insight. However I have accepted mgilson's answer. – Aubert 3/9, 2016 at 23:36

@BryanOakley Examining the source code didn't even cross my mind. This is another good solution to the problem. – Aubert 3/9, 2016 at 23:40

You may find the story of Shlemiel the painter (originally told here) helpful when trying to understand the potential performance costs of doing a lot of += concatenation with strings. The exact reason += may be O(N) in Python is not exactly the same reason strcat is O(N) in C, but it's similar. – Yarrow 4/9, 2016 at 0:2

@Yarrow It's more like a = asprintf("%s%s", a, b), except also with doing garbage collection (since something else may or may not hold a reference to the original a). The optimizations that are mentioned, I suspect, have to do with the fact that it knows when nothing else holds a reference to the original, and can do some trickery in that case - allocate a bit more space than needed, and reuse the same string since nothing else is using it and it's about to be free'd. – Brodsky 4/9, 2016 at 2:38

Doens't this depend very much on the number of strings to concatenate? I'd think (and %timeit confirms this) that concatenating two strings with + is much faster than using join. – Purl 4/9, 2016 at 9:45

@Purl With 2 strings you aren't creating any unneccesary copy. What this question is about is when you have a sequence of strings texts with unknown size that could be tens/hundreds/thousands and you want to concatenate them, in such cases ''.join(texts) will be faster than a loop over texts using +=. – Zigzagger 4/9, 2016 at 10:2

@Zigzagger "What this question is about is when you have a sequence of strings" Is it? I don't see any mention of the number of strings to concatenate, or of loops and such, in the question or OP's comments. Thus I think it's worth noting that for some cases, + is indeed faster. – Purl 4/9, 2016 at 10:43

@Purl The question claims links to an answer that clearly states that when the performance is an issue, i.e. when you have a large number of concatenations, join is the better choice. – Zigzagger 4/9, 2016 at 10:48

@Zigzagger Granted, I did not read the linked question. Of course, performance only becomes a concern when you have to do something very often, which makes it likely that OP is having a loop in mind. But you could also have a large number of pairwise concatenations in a loop, as in [x+y for x,y in zip(lst1, lst2)]. And here, again, + is faster. – Purl 4/9, 2016 at 11:5

Let's say you have this code to build up a string from three strings:

x = 'foo'
x += 'bar'  # 'foobar'
x += 'baz'  # 'foobarbaz'

In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.

With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.

There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.

However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.

Sd answered 3/9, 2016 at 23:22 Comment(17)

When I read your answer for the first time, I thought "how does the interpreter know it needs to allocate space for 'foobar'? Does it read my mind, knowing I'm going to join those at some point?" I think you're assuming there's code like foo += bar + baz. Your answer would make a little more sense if you showed the code that would cause the allocation. – Northman 3/9, 2016 at 23:45

@BryanOakley: foo += bar; foo += baz; will behave exactly as this post describes. foo = foo+ bar + baz; as well. foo += bar + baz works subtly differently, but no faster. – Frequentation 4/9, 2016 at 0:15

@MooingDuck: I understand. That's not the point. The point is, neither the original question nor the answer shows the expression foo += bar. A beginner may stumble on this answer and wonder why python allocates space for "foobar" in the absence of an expression. – Northman 4/9, 2016 at 0:27

@BryanOakley No, his assumption is foo += bar; foo += baz. For foo += bar + baz you have the opposite problem, it has to allocate "barbaz". – Brodsky 4/9, 2016 at 2:33

I've improved the code and explanation a bit... I'm still not sure the last paragraph is correct, though. As described, it suggests that join will go through the input twice, which it cannot do if the input is an iterator. – Brodsky 4/9, 2016 at 2:37

@Random832: you are completely missing (and simultaneously proving) the point. The point Is, there is (well, was) no context, no expression at all. The reader was forced to guess, like you and I both did. Whether it's foo += bar or something else is irrelevant to my comment. The point is moot now, however, since the answer has been approved. – Northman 4/9, 2016 at 3:3

@Brodsky -- It actually is true. join does go through the input twice. It accomplishes this by creating a sequence from the input before the first run-through. – Sd 4/9, 2016 at 3:18

@Sd An other way to handle this efficiently would be to double the buffer size when resizing. This may require a final shrink of the buffer (which might be O(1) or O(n) depending on the memory allocator) and should result in O(n) complexity too with a higher constant. – Zigzagger 4/9, 2016 at 9:58

@Random832: "join will go through the input twice". And that's why you should pass a list comprehension to .join in preference to a generator expression. – Getup 4/9, 2016 at 13:15

This answer assumes that your runtime is really stupid. It would surprise me if cpython actually creates intermediate strings for every one. PyPy surely doesn't. – Schlessel 4/9, 2016 at 16:6

Quoting an answer I got in the SO Python chat "String concatenation via + or += in CPython is optimised, and in Python 3 or Python 2.5+ it's ok to use concatenation in a loop if the final result is less than 1k or so in length, but after that it gets pretty slow compared to using .join" by PM_2RNG chat.stackoverflow.com/transcript/message/32658180#32658180 – Schlessel 4/9, 2016 at 16:28

And, SOPython were also kind enough to provide proof that the general optimization isn't there bugs.python.org/issue1569040 and chat.stackoverflow.com/transcript/message/32658330#32658330 – Schlessel 4/9, 2016 at 16:58

@BenjaminGruenbaum Multiple string concatenation via += (which I assume you are talking about) is not optimized in CPython which can be clearly seen by analizing the bytecode via dis. It would be hard to optimize it since the interpreter does not know a priori what the left side is and what += operator does on it. The same goes for pypy. Besides the general concept of Python is to not change execution sequence of commands. – Settles 4/9, 2016 at 23:23

@PM2Ring List comprehension itself loops through the input so I don't see the point of doing that. Actually you only lose both performance and memory since doing that in Python is slower then in the underlying vm. – Settles 4/9, 2016 at 23:28

@Settles -- FWIW, it's pretty well documented (at least on Stackoverflow) that joining a list comprehension is slightly faster than joining a generator expression. The real world time difference is very minimal however so I disagree that one way or the other is preferred (e.g. I've never seen it in a style-guide). Personally I usually still use the generator to be consistent with other code that I write, but in a code review, I will let it slide when I wouldn't let something like sum([x for x in ...]) (why waste the memory?)... – Sd 5/9, 2016 at 2:11

@Settles See this answer by Python core dev Raymond Hettinger. Also, in Python 2, a gen exp has a tiny bit more setup time & overhead than a list comp, since it has to create a local scope, whereas a list comp runs in the scope of the containing code. – Getup 5/9, 2016 at 2:30

@Settles It definitely is optimized, it's done in the ceval loop, though, so you wouldn't see it in the disassembly. See this blog post for benchmarks and a link to the optimization code. – Cavorilievo 16/11, 2019 at 3:49

I think this behaviour is best explained in Lua's string buffer chapter.

To rewrite that explanation in context of Python, let's start with an innocent code snippet (a derivative of the one at Lua's docs):

s = ""
for l in some_list:
  s += l

Assume that each l is 20 bytes and the s has already been parsed to a size of 50 KB. When Python concatenates s + l it creates a new string with 50,020 bytes and copies 50 KB from s into this new string. That is, for each new line, the program moves 50 KB of memory, and growing. After reading 100 new lines (only 2 KB), the snippet has already moved more than 5 MB of memory. To make things worse, after the assignment

s += l

the old string is now garbage. After two loop cycles, there are two old strings making a total of more than 100 KB of garbage. So, the language compiler decides to run its garbage collector and frees those 100 KB. The problem is that this will happen every two cycles and the program will run its garbage collector two thousand times before reading the whole list. Even with all this work, its memory usage will be a large multiple of the list's size.

And, at the end:

This problem is not peculiar to Lua: Other languages with true garbage collection, and where strings are immutable objects, present a similar behavior, Java being the most famous example. (Java offers the structure StringBuffer to ameliorate the problem.)

Python strings are also immutable objects.

Anthropology answered 4/9, 2016 at 6:44 Comment(4)

It's worth noting that CPython (the main Python interpreter) cheats a bit with respect to strings being immutable (this is the "optimization" that's been vaguely alluded to). If it sees you do += and the name on the left side is bound to a string with exactly one reference, it tries to resize that string's buffer in place (which may or may not work depending on some low-level memory allocation details). It makes repeated += operations much faster when it works (indeed, using a loop with += may be faster than "".join). The big reason not to use it is for cross-interpreter compatibility. – Yarrow 4/9, 2016 at 6:54

LuaJIT eats this sort of loop for breakfast and I seriously doubt you'd get more than 2-3 allocations here. I'd love to be proven wrong though. – Schlessel 4/9, 2016 at 16:29

@Yarrow Could you explain more on "cross-interpreter compatibility"? – Cuyp 7/9, 2016 at 11:28

@Yarrow And what do you think of the accepted answer, it says

So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer.

. – Cuyp 7/9, 2016 at 11:38

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags