Does Python manipulate string object as copy on write style
Asked Answered
W

4

6

I noticed that in python, string object keeps only one copy. Like below code:

>>> s1='abcde'
>>> s2='abcde'
>>> s1 is s2
True

s1 and s2 point to the same object.

When I edit s1, s2 still keeps the object ('abcde'), but the s1 points to a new copy. This behavior likes copy on write.

>>> s1=s1+'f'
>>> s1 is s2
False
>>> s1
'abcdef'
>>> s2
'abcde'

So does python really use the copy on write mechanisim on string object?

Whiplash answered 25/2, 2015 at 5:34 Comment(0)
M
4

yes; both s1 and s2 will point to same object; because they are interned(based on some rules);

In [73]: s1='abcde'

In [74]: s2='abcde'

In [75]: id(s1), id(s2), s1 is s2
Out[75]: (63060096, 63060096, True)

like one rule is; you are only allowed ascii letters, digits or underscores;

In [77]: s1='abcde!'

In [78]: s2='abcde!'

In [79]: id(s1), id(s2), s1 is s2
Out[79]: (84722496, 84722368, False)

also; interesting thing is by default all 0 and length 1 strings are interned;

In [80]: s1 = "_"

In [81]: s2 = "_"

In [82]: id(s1), id(s2), s1 is s2
Out[82]: (8144656, 8144656, True)

In [83]: s1 = "!"

In [84]: s2 = "!"

In [85]: id(s1), id(s2), s1 is s2
Out[85]: (8849888, 8849888, True)

if i will produce my string at runtime; it won't be interned;

In [86]: s1 = "abcde"

In [87]: s2 = "".join(['a', 'b', 'c', 'd', 'e'])

In [88]: id(s1), id(s2), s1 is s2
Out[88]: (84722944, 84723648, False)

"...during peephole optimization is called constant folding and consists in simplifying constant expressions in advance"(from this link) and these expression based on above rules will be interned

In [91]: 'abc' +'de' is 'abcde'
Out[91]: True

In [92]: def foo():
    ...:     print "abc" + 'de'
    ...:     

In [93]: def foo1():
    ...:     print "abcde"
    ...:     

In [94]: dis.dis(foo)
  2           0 LOAD_CONST               3 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

In [95]: dis.dis(foo1)
  2           0 LOAD_CONST               1 ('abcde')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE        

and that with the length less than equal to 20;

In [96]: "a" * 20 is 'aaaaaaaaaaaaaaaaaaaa'
Out[96]: True

In [97]: 'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'
Out[97]: False

and its all because python strings are immutable; you can't edit them;

In [98]: s1 = "abcde"

In [99]: s1[2] = "C"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-99-1d7c49892017> in <module>()
----> 1 s1[2] = "C"

TypeError: 'str' object does not support item assignment

Python provides intern Built-in Function; in python 3.x it is in sys module;

In [100]: s1 = 'this is a longer string than yours'

In [101]: s2 = 'this is a longer string than yours'

In [102]: id(s1), id(s2), s1 is s2
Out[102]: (84717088, 84717032, False)

In [103]: s1 = intern('this is a longer string than yours')

In [104]: s2 = intern('this is a longer string than yours')

In [105]: id(s1), id(s2), s1 is s2
Out[105]: (84717424, 84717424, True)

You can read more at below given links:

http://guilload.com/python-string-interning/

Does Python intern strings?

Metic answered 25/2, 2015 at 7:57 Comment(0)
G
3

No copying is taking place in any relevant sense. Your new string is an entirely new string object. It is no different than if you had done s1 = 'abcdef'. Some kinds of objects in Python allow you to modify them "in-place", but not strings. (In Python parlance, strings are immutable.)

Note that the fact that your two original strings are the same object is due to an implementation-specific optimization and will not always be true:

>>> s1 = 'this is a longer string than yours'
>>> s2 = 'this is a longer string than yours'
>>> s1 is s2
False
Gummous answered 25/2, 2015 at 5:36 Comment(3)
I know your meaning, there's no copying, there is only creating new space. So it should be called as 'CreateNew on write', is it?Whiplash
@roast_soul: Not really. The "write" has nothing to do with it. Even if you just wrote s1+'f' or 'abcdef' but didn't assign it to anything, the new object would still be created. Whether a new object is created as part of an operation depends on what that operation is, not what (if anything) you're doing with the result.Gummous
so, there is no 'write' operation on string, any operation on string cause the 'creating new space' action. right??Whiplash
E
1

It is creating a new string object in and of itself!

s1=s1+'f'

is no different to:

s1 = 'abcdef'

Note that this can slow down your program significantly if you're appending multiple times to a string (because you are really creating multiple strings). This is a known anti-pattern since every concatenation creates a new string. This results in O(N^2) running time

Erdda answered 25/2, 2015 at 5:38 Comment(0)
W
-1

String are immutable. Thus you cant "edit" a string. You get a new copy, i.e. new string object, in a place where you think you "edit" it.

Wilsonwilt answered 25/2, 2015 at 5:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.