Why does comparing strings using either '==' or 'is' sometimes produce a different result?
Asked Answered
D

15

1318

Two string variables are set to the same value. s1 == s2 always returns True, but s1 is s2 sometimes returns False.

If I open my Python interpreter and do the same is comparison, it succeeds:

>>> s1 = 'text'
>>> s2 = 'text'
>>> s1 is s2
True

Why is this?

Disarm answered 1/10, 2009 at 15:40 Comment(7)
see: #1392933Gaona
This problem also occurs when you read a console input via e.g.: input = raw_input("Decide (y/n): "). In this case an input of "y" and if input == 'y':will return "True" while if input is 'y': will return False.Agranulocytosis
This blog provides a far more complete explanation than any answer guilload.com/python-string-interningSimonides
As @chris-rico mentions, I great explanation here https://mcmap.net/q/37455/-python-string-interning/1695680Hildegardhildegarde
can you explain specifically why in the original poster example the "is" operator fail on the script but the same operator on the same strings return "true" on the interactive mode? It's the same interpreter, so we should expect the same behaviour about interning whatever you have it run a script or run in interactive mode, right?Spinous
Possible duplicate of Is there a difference between `==` and `is` in Python?Nefertiti
Unless you're checking that something is None or is not None, you should basically never use is.Hexagon
S
1677

is is identity testing, and == is equality testing. What happens in your code would be emulated in the interpreter like this:

>>> a = 'pub'
>>> b = ''.join(['p', 'u', 'b'])
>>> a == b
True
>>> a is b
False

So, no wonder they're not the same, right?

In other words: a is b is the equivalent of id(a) == id(b)

Symptomatology answered 1/10, 2009 at 15:45 Comment(12)
ahh same as eq? vs equal? in scheme, got it.Disarm
There's a lot of equality predicates in Common Lisp, besides the basic hierarchy of eq, eql, equal, and equalp (the longer the name there, the more things will be found equal).Lightheaded
Or == vs .equals() in Java. The best part is that the Python == is not analogous to the Java ==.Forwhy
And what with the None value?Eugenioeugenius
@Крайст: there is only a single None value. So it always has the same id.Symptomatology
@SilentGhost: GvR in PEP8 says that we must never check equality and use only is None expression. Read about it yesterday after commented that above.Eugenioeugenius
Simple, one-line test: a = 'pub'; b = ''.join(a); a == b, a is bSoma
This doesn't address the OP's "is -> True" example.Americanism
why do "s2 is s1" is True in the question then?Autochthon
@AlexanderSupertramp, because of string interning.Bate
Here's an excellent article on StackAbuse explaining this and a few related mattersMacrocosm
@pfabri: That's a terrible article. It includes the atrocious recommendation "A rule of thumb to follow is to use == when comparing immutable types (like ints) and is when comparing objects." You almost never want is for comparing user-defined objects. The correct rule of thumb is "Use == for everything but None and NotImplemented comparisons until you understand what you're doing." The article also fails to convey that the identity semantics of ints are a CPython implementation detail that only applies to small of ints, not some caching mechanism common to all ints.Blocking
Q
661

Other answers here are correct: is is used for identity comparison, while == is used for equality comparison. Since what you care about is equality (the two strings should contain the same characters), in this case the is operator is simply wrong and you should be using == instead.

The reason is works interactively is that (most) string literals are interned by default. From Wikipedia:

Interned strings speed up string comparisons, which are sometimes a performance bottleneck in applications (such as compilers and dynamic programming language runtimes) that rely heavily on hash tables with string keys. Without interning, checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple object identity test suffices after the original intern operation; this is typically implemented as a pointer equality test, normally just a single machine instruction with no memory reference at all.

So, when you have two string literals (words that are literally typed into your program source code, surrounded by quotation marks) in your program that have the same value, the Python compiler will automatically intern the strings, making them both stored at the same memory location. (Note that this doesn't always happen, and the rules for when this happens are quite convoluted, so please don't rely on this behavior in production code!)

Since in your interactive session both strings are actually stored in the same memory location, they have the same identity, so the is operator works as expected. But if you construct a string by some other method (even if that string contains exactly the same characters), then the string may be equal, but it is not the same string -- that is, it has a different identity, because it is stored in a different place in memory.

Quiles answered 1/10, 2009 at 16:2 Comment(10)
Where can someone read more on the convoluted rules for when strings are interned?Soma
+1 for a thorough explanation. Not sure how the other answer received so many upvotes without explaining what ACTUALLY happened.Laboured
this is exactly what I thought of when I read the question. The accepted answer is short yet contains the fact, but this answer explains things far better. Nice!Demonic
@NoctisSkytower Googled the same and found this guilload.com/python-string-interningChilcote
So the general rule of thumb is to always use ==/!= when the strings are created via different methods?Usurer
@naught101: No, the rule is to choose between == and is based on what kind of check you want. If you care about the strings being equal (that is, having the same contents) then you should always use ==. If you care about whether any two Python names refer to the same object instance, you should use is. You might need is if you're writing code that handles lots of different values without caring about their contents, or else if you know there is only one of something and you want to ignore other objects pretending to be that thing. If you're not sure, always chose ==.Quiles
Does this imply that python string comparison via == first checks for object identity and then, failing that, for string equality?Bedside
@Scott: Yes, as a performance optimization, the CPython implementation does do an identity check first. I don't think it's guaranteed by the language spec though.Quiles
this is the bit the first answer missed, the string public was interned as an object and the two variables pointed to the same object so id(s1) is id(s2)Disarm
@Disarm Perhaps this should be the accepted answer.Bursitis
M
123

The is keyword is a test for object identity while == is a value comparison.

If you use is, the result will be true if and only if the object is the same object. However, == will be true any time the values of the object are the same.

Marquee answered 1/10, 2009 at 15:45 Comment(0)
B
66

One last thing to note is you may use the sys.intern function to ensure that you're getting a reference to the same string:

>>> from sys import intern
>>> a = intern('a')
>>> a2 = intern('a')
>>> a is a2
True

As pointed out in previous answers, you should not be using is to determine equality of strings. But this may be helpful to know if you have some kind of weird requirement to use is.

Note that the intern function used to be a built-in on Python 2, but it was moved to the sys module in Python 3.

Buchalter answered 1/10, 2009 at 16:4 Comment(0)
F
49

is is identity testing and == is equality testing. This means is is a way to check whether two things are the same things, or just equivalent.

Say you've got a simple person object. If it is named 'Jack' and is '23' years old, it's equivalent to another 23-year-old Jack, but it's not the same person.

class Person(object):
   def __init__(self, name, age):
       self.name = name
       self.age = age

   def __eq__(self, other):
       return self.name == other.name and self.age == other.age

jack1 = Person('Jack', 23)
jack2 = Person('Jack', 23)

jack1 == jack2 # True
jack1 is jack2 # False

They're the same age, but they're not the same instance of person. A string might be equivalent to another, but it's not the same object.

Fearsome answered 29/4, 2013 at 0:56 Comment(1)
If you change set jack1.age = 99, that won't change jack2.age. That's because they are two different instances, so jack1 is not jack2. However, they can equal each other jack1 == jack2 if their name and their age are the same. It gets more complicated for strings, because strings are immutable in Python, and Python often reuses the same instance. I like this explanation because it uses the simple cases (a normal object) rather then the special cases (strings).Skilling
M
40

This is a side note, but in idiomatic Python, you will often see things like:

if x is None:
    # Some clauses

This is safe, because there is guaranteed to be one instance of the Null Object (i.e., None).

Moderation answered 1/10, 2009 at 18:51 Comment(3)
Is the same true for True and False? Only one instance so is will match?Capillary
@Capillary Yes, they are singletons both in python 2 and 3.Herdic
@Herdic but in Python 2 you can reassign False and True.Tuberous
J
34

If you're not sure what you're doing, use the '=='. If you have a little more knowledge about it you can use 'is' for known objects like 'None'.

Otherwise, you'll end up wondering why things doesn't work and why this happens:

>>> a = 1
>>> b = 1
>>> b is a
True
>>> a = 6000
>>> b = 6000
>>> b is a
False

I'm not even sure if some things are guaranteed to stay the same between different Python versions/implementations.

Josettejosey answered 1/10, 2009 at 16:57 Comment(4)
Interesting example showing how reassigning ints makes triggers this condition. Why did this fail? Is it due to intern-ing or something else?Devoid
It looks like the reason the is returns false may due to the interpreter implementation: #133488Devoid
Read : #306813 and https://mcmap.net/q/37596/-why-0-6-is-6-false-duplicateMatriarchy
@ArchitJain Yes, those links explain it pretty well. When you read them, you'll know what numbers you can use 'is' on. I just wish they would explain why it is still not a good idea to do that :) You knowing this does not make it a good idea to assume everyone else does as well (or that the internalized number range will never change)Josettejosey
F
26

From my limited experience with Python, is is used to compare two objects to see if they are the same object as opposed to two different objects with the same value. == is used to determine if the values are identical.

Here is a good example:

>>> s1 = u'public'
>>> s2 = 'public'
>>> s1 is s2
False
>>> s1 == s2
True

s1 is a Unicode string, and s2 is a normal string. They are not the same type, but they are the same value.

Freetown answered 1/10, 2009 at 15:48 Comment(1)
This result is due to a different reason: comparing a unicode string (<type 'unicode'>) to a non-unicode string (<type 'str'>). It is behavior specific to Python 2. In Python 3, both of s1 and are of type str, and both is and == return True.Hilltop
A
22

I think it has to do with the fact that, when the 'is' comparison evaluates to false, two distinct objects are used. If it evaluates to true, that means internally it's using the same exact object and not creating a new one, possibly because you created them within a fraction of 2 or so seconds and because there isn't a large time gap in between it's optimized and uses the same object.

This is why you should be using the equality operator ==, not is, to compare the value of a string object.

>>> s = 'one'
>>> s2 = 'two'
>>> s is s2
False
>>> s2 = s2.replace('two', 'one')
>>> s2
'one'
>>> s2 is s
False
>>> 

In this example, I made s2, which was a different string object previously equal to 'one' but it is not the same object as s, because the interpreter did not use the same object as I did not initially assign it to 'one', if I had it would have made them the same object.

Ame answered 1/10, 2009 at 15:45 Comment(1)
Using .replace() as an example in this context is probably not the best, though, because its semantics can be confusing. s2 = s2.replace() will always create a new string object, assign the new string object to s2, and then dispose of the string object that s2 used to point to. So even if you did s = s.replace('one', 'one') you would still get a new string object.Quiles
A
18

The == operator tests value equivalence. The is operator tests object identity, and Python tests whether the two are really the same object (i.e., live at the same address in memory).

>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True

In this example, Python only created one string object, and both a and b refers to it. The reason is that Python internally caches and reuses some strings as an optimization. There really is just a string 'banana' in memory, shared by a and b. To trigger the normal behavior, you need to use longer strings:

>>> a = 'a longer banana'
>>> b = 'a longer banana'
>>> a == b, a is b
(True, False)

When you create two lists, you get two objects:

>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False

In this case we would say that the two lists are equivalent, because they have the same elements, but not identical, because they are not the same object. If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily identical.

If a refers to an object and you assign b = a, then both variables refer to the same object:

>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True

Reference: Think Python 2e by Allen B. Downey

Aragonite answered 5/7, 2017 at 4:59 Comment(1)
This is the only answer which actually explains the behavior seen, especially the part where it will sometimes work one way and sometimes another, depending on string length. By the way, taking lower() of two strings can also switch the result from True to False.Cyprinid
C
17

I believe that this is known as "interned" strings. Python does this, so does Java, and so do C and C++ when compiling in optimized modes.

If you use two identical strings, instead of wasting memory by creating two string objects, all interned strings with the same contents point to the same memory.

This results in the Python "is" operator returning True because two strings with the same contents are pointing at the same string object. This will also happen in Java and in C.

This is only useful for memory savings though. You cannot rely on it to test for string equality, because the various interpreters and compilers and JIT engines cannot always do it.

Corset answered 1/10, 2009 at 15:59 Comment(2)
Yeah that is the answer that i expect. OK Hari is right but how python does this. Thanks Zan, for memory optimization.Insolvency
And .NET (C#, VB.NET, etc.)?Vivyan
W
14

Actually, the is operator checks for identity and == operator checks for equality.

From the language reference:

Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)

So from the above statement we can infer that the strings, which are immutable types, may fail when checked with "is" and may succeed when checked with "is".

The same applies for int and tuple which are also immutable types.

Wedged answered 8/8, 2014 at 17:4 Comment(0)
C
7

is will compare the memory location. It is used for object-level comparison.

== will compare the variables in the program. It is used for checking at a value level.

is checks for address level equivalence

== checks for value level equivalence

Cowley answered 14/9, 2017 at 20:27 Comment(0)
F
2

is is identity testing and == is equality testing (see the Python documentation).

In most cases, if a is b, then a == b. But there are exceptions, for example:

>>> nan = float('nan')
>>> nan is nan
True
>>> nan == nan
False

So, you can only use is for identity tests, never equality tests.

Firecure answered 24/4, 2017 at 11:37 Comment(0)
C
-4

The basic concept, we have to be clear, while approaching this question, is to understand the difference between is and ==.

"is" is will compare the memory location. if id(a)==id(b), then a is b returns true else it returns false.

So, we can say that is is used for comparing memory locations. Whereas,

== is used for equality testing which means that it just compares only the resultant values. The below shown code may acts as an example to the above given theory.

Code

Click here for the code

In the case of string literals (strings without getting assigned to variables), the memory address will be same as shown in the picture. so, id(a)==id(b). The remaining of this is self-explanatory.

Coolidge answered 26/12, 2020 at 16:31 Comment(2)
Could you post your code directly in code tags?Teahan
Please review Why not upload images of code/errors when asking a question? (e.g., "Images should only be used to illustrate problems that can't be made clear in any other way, such as to provide screenshots of a user interface.") and take the appropriate action (it covers answers as well). Thanks in advance.Vivyan

© 2022 - 2024 — McMap. All rights reserved.