Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is
==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!@#$%^&*() \][=-. >:"?<a'; y='`!@#$%^&*() \][=-. >:"?<a';
>>> z='`!@#$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!@#$%^&*() \][=-. >:"?<a';
y='`!@#$%^&*() \][=-. >:"?<a';
z=(lambda : '`!@#$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True
and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
==
to compare any item for equality but this is an interesting question nonetheless – Fluffa is b
(noticing the string constant assigned tob
has already been created and re-using it). The interning rule must care about spaces (or possibly length) – Affirmid('ab')
consistently returns the same value in my shell whileid('a ')
consistently changes. I still have no idea why letters would have different behavior, but it's interesting to observe. Perhaps Python makes some kind of optimization by assuming that strings will often contain letters? I don't think that would make much sense but it's hard to explain this behavior. This is an interesting question. – Aftonagis
really does, maybe this question would be helpful - if it contained a useful answer. – Notwithstanding