There's another issue that isn't pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.
For one thing, there are some other pre-created values, such as the empty tuple
, str
, and bytes
, and some short strings (in CPython 3.6, it's the 256 single-character Latin-1 strings). For example:
>>> a = ()
>>> b = ()
>>> a is b
True
But also, even non-pre-created values can be identical. Consider these examples:
>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True
And this isn't limited to int
values:
>>> g, h = 42.23e100, 42.23e100
>>> g is h
True
Obviously, CPython doesn't come with a pre-created float
value for 42.23e100
. So, what's going on here?
The CPython compiler will merge constant values of some known-immutable types like int
, float
, str
, bytes
, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c
and d
are defined in separate statements, their values aren't merged. Since e
and f
are defined in the same statement, their values are merged.
You can see what's going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128
and then calling dis.dis
on it, and you'll see that there's a single constant value (128, 128)
>>> def f(): i, j = 258, 258
>>> dis.dis(f)
1 0 LOAD_CONST 2 ((128, 128))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 0 (i)
6 STORE_FAST 1 (j)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480
You may notice that the compiler has stored 128
as a constant even though it's not actually used by the bytecode, which gives you an idea of how little optimization CPython's compiler does. Which means that (non-empty) tuples actually don't end up merged:
>>> k, l = (1, 2), (1, 2)
>>> k is l
False
Put that in a function, dis
it, and look at the co_consts
—there's a 1
and a 2
, two (1, 2)
tuples that share the same 1
and 2
but are not identical, and a ((1, 2), (1, 2))
tuple that has the two distinct equal tuples.
There's one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn't restricted to source code literals:
>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True
On the other hand, it is limited to the str
type, and to strings of internal storage kind "ascii compact", "compact", or "legacy ready", and in many cases only "ascii compact" will get interned.
At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.
It can be worth learning the rules for one specific Python for the fun of it. But it's not worth relying on them in your code. The only safe rule is:
- Do not write code that assumes two equal but separately-created immutable values are identical (don't use
x is y
, use x == y
)
- Do not write code that assumes two equal but separately-created immutable values are distinct (don't use
x is not y
, use x != y
)
Or, in other words, only use is
to test for the documented singletons (like None
) or that are only created in one place in the code (like the _sentinel = object()
idiom).
is
- do not use it to test for equality of integers, strings, tuples, or other things like these." However, i am trying to integrate a simple state machine into my class, and since the states are opaque values whose only observable property is that of being identical or different, it looks quite natural for them to be comparable withis
. I plan to use interned strings as states. I would have preferred plain integers, but unfortunately Python cannot intern integers (0 is 0
is an implementation detail). – Chump