Another duplicate was asking why two equal strings are generally not identical, which isn't really answered here:
>>> x = 'a'
>>> x += 'bc'
>>> y = 'abc'
>>> x == y
True
>>> x is y
False
So, why aren't they the same string? Especially given this:
>>> z = 'abc'
>>> w = 'abc'
>>> z is w
True
Let's put off the second part for a bit. How could the first one be true?
The interpreter would have to have an "interning table", a table mapping string values to string objects, so every time you try to create a new string with the contents 'abc'
, you get back the same object. Wikipedia has a more detailed discussion on how interning works.
And Python has a string interning table; you can manually intern strings with the sys.intern
method.
In fact, Python is allowed to automatically intern any immutable types, but not required to do so. Different implementations will intern different values.
CPython (the implementation you're using if you don't know which implementation you're using) auto-interns small integers and some special singletons like False
, but not strings (or large integers, or small tuples, or anything else). You can see this pretty easily:
>>> a = 0
>>> a += 1
>>> b = 1
>>> a is b
True
>>> a = False
>>> a = not a
>>> b = True
a is b
True
>>> a = 1000
>>> a += 1
>>> b = 1001
>>> a is b
False
OK, but why were z
and w
identical?
That's not the interpreter automatically interning, that's the compiler folding values.
If the same compile-time string appears twice in the same module (what exactly this means is hard to define—it's not the same thing as a string literal, because r'abc'
, 'abc'
, and 'a' 'b' 'c'
are all different literals but the same string—but easy to understand intuitively), the compiler will only create one instance of the string, with two references.
In fact, the compiler can go even further: 'ab' + 'c'
can be converted to 'abc'
by the optimizer, in which case it can be folded together with an 'abc'
constant in the same module.
Again, this is something Python is allowed but not required to do. But in this case, CPython always folds small strings (and also, e.g., small tuples). (Although the interactive interpreter's statement-by-statement compiler doesn't run the same optimization as the module-at-a-time compiler, so you won't see exactly the same results interactively.)
So, what should you do about this as a programmer?
Well… nothing. You almost never have any reason to care if two immutable values are identical. If you want to know when you can use a is b
instead of a == b
, you're asking the wrong question. Just always use a == b
except in two cases:
- For more readable comparisons to the singleton values like
x is None
.
- For mutable values, when you need to know whether mutating
x
will affect the y
.