It's a CPython-specific optimization for the case when the str
being appended to happens to have no other living references. The interpreter "cheats" in this case, allowing it to modify the existing string by reallocating (which can be in place, depending on heap layout) and appending the data directly, and often reducing the work significantly in loops that repeatedly concatenate (making it behave more like the amortized O(1)
appends of a list
rather than O(n)
copy operations each time). It has no visible effect besides the unchanged id
, so it's legal to do this (no one with an existing reference to a str
ever sees it change unless the str
was logically being replaced).
You're not actually supposed to rely on it (non-reference counted interpreters can't use this trick, since they can't know if the str
has other references), per PEP8's very first programming recommendation:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b
or a = a + b
. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join()
form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
If you want to break the optimization, there are all sorts of ways to do so, e.g. changing your code to:
>>> while i!=0:
... s_alias = s # Gonna save off an alias here
... s += str(i)
... print(s + " stored at " + str(id(s)))
... i -= 1
...
breaks it by creating an alias, increasing the reference count and telling Python that the change would be visible somewhere other than s
, so it can't apply it. Similarly, code like:
s = s + a + b
can't use it, because s + a
occurs first, and produces a temporary that b
must then be added to, rather than immediately replacing s
, and the optimization is too brittle to try to handle that. Almost identical code like:
s += a + b
or:
s = s + (a + b)
restores the optimization by ensuring the final concatenation is always one where s
is the left operand and the result is used to immediately replace s
.
str
isimmutable
? – ChanterelleThe standard wisdom is that Python strings are immutable. You can't change a string's value, only the reference to the string.
continue reading here – Chanterelleid
checks of this sort. If you adds2 = s
after thes += str(i)
line, you'll see theid
change all the time, because now that thestr
is visible through multiple aliases, they can't use the optimization. – Dorfmanid
changes. The optimization in CPython sometimes lets it avoid that copy byrealloc
ing in place when it can, if the mutation is not otherwise detectable. – Dorfmansizeof(thestruct)
, or just by allocating extra and casting a pointer to the byte after the struct to the correct type; oldstr
did the former, newstr
[with variable width characters] does the latter). – Dorfmanstr
in modern Python. – Dorfmanid
is temporally unique (while an object possesses a specificid
, no other object can have the sameid
).s += str(i)
for immutable types is defined to produce the concatenated value ofs
andstr(i)
first, then replaces
after. Sinces
still exists when the concatenation occurs, the concatenated value would not be allowed to have the sameid
(becauses
already has it). CPython is cheating when it does this, but it's a harmless sort of cheat (improves performance, only observable effect otherwise is theid
quirk). – Dorfmanid
consistency guarantee, that, in combination, are definitely broken by this optimization (in the sense that it proves a violation ofstr
immutability). Your "copy it back to the oldstr
s memory" suggestion (which is itself based on a CPython optimization detail) isn't allowed, because by the language spec, both objects must briefly coexist, and theid
of both must be constant throughout their lifetime. – Dorfmanx = x.__add__(y)
is allowed to drop the old reference before calling the function. Fun fact: if you do uses = s.__add__(t)
instead ofs += t
, you get different ids, so they're not really equivalent. – Marrakechid
s is not in itself an indication that an immutable type is mutated, and commonly happens because the memory is simply re-used. The other question already starts at taking the optimisation for granted and asks how CPython specifically implements it. Neither does this question's answers solve the other question, nor do the other question's answers solve this question. They are interesting trivia wrt each other but not more. – Buonomo