What objects are guaranteed to have different identity?
Asked Answered
R

3

3

ORIGINAL QUESTION:

(My question applies to Python 3.2+, but I doubt this has changed since Python 2.7.)

Suppose I use an expression that we usually expect to create an object. Examples: [1,2,3]; 42; 'abc'; range(10); True; open('readme.txt'); MyClass(); lambda x : 2 * x; etc.

Suppose two such expressions are executed at different times and "evaluate to the same value" (i.e., have the same type, and compare as equal). Under what conditions does Python provide what I call a distinct object guarantee that the two expressions actually create two distinct objects (i.e., x is y evaluates as False, assuming the two objects are bound to x and y, and both are in scope at the same time)?

I understand that for objects of any mutable type, the "distinct object guarantee" holds:

x = [1,2]
y = [1,2]
assert x is not y # guaranteed to pass 

I also know for certain immutable types (str, int) the guarantee does not hold; and for certain other immutable types (bool, NoneType), the opposite guarantee holds:

x = True
y = not not x
assert x is not y # guaranteed to fail
x = 2
y = 3 - 1
assert x is not y # implementation-dependent; likely to fail in CPython
x = 1234567890
y = x + 1 - 1
assert x is not y # implementation-dependent; likely to pass in CPython

But what about all the other immutable types?

In particular, can two tuples created at different times have the same identity?

The reason I'm interested in this is that I represent nodes in my graph as tuples of int, and the domain model is such that any two nodes are distinct (even if they are represented by tuples with the same values). I need to create sets of nodes. If Python guarantees that tuples created at different times are distinct objects, I could simply subclass tuple to redefine equality to mean identity:

class DistinctTuple(tuple):
  __hash__ = tuple.__hash__
  def __eq__(self, other):
    return self is other

x = (1,2)
y = (1,2)
s = set(x,y)
assert len(s) == 1 # pass; but not what I want
x = DistinctTuple(x)
y = DistinctTuple(y)
s = set(x,y)
assert len(s) == 2 # pass; as desired

But if tuples created at different times are not guaranteed to be distinct, then the above is a terrible technique, which hides a dormant bug that may appear at random and may be very hard to replicate and find. In that case, subclassing won't help; I will actually need to add to each tuple, as an extra element, a unique id. Alternatively, I can convert my tuples to lists. Either way, I'd use more memory. Obviously, I'd prefer not to use these alternatives unless my original subclassing solution is unsafe.

My guess is that Python does not offer the "distinct object guarantee" for immutable types, either built-in or user-defined. But I haven't found a clear statement about it in the documentation.

UPDATE 1:

@LuperRouch @larsmans Thank you for the discussion and the answer so far. Here's the last issue I'm still unclear with:

Is there any chance that the creation of an object of a user-defined type results in a reuse of an existing object?

If this is possible, I'd like to know how I can verify for any class I work with whether it might exhibit such a behavior.

Here's my understanding. Any time an object of a user-defined class is created, the class' __new__() method is called first. If this method is overridden, nothing in the language would prevent the programmer from returning a reference to an existing object, thus violating my "distinct object guarantee". Obviously, I can observe it by examining the class definition.

I am not sure what happens if a user-defined class does not override __new__() (or explicitly relies __new__() from the base class). If I write

class MyInt(int):
  pass

the object creation is handled by int.__new__(). I would expect that this means I may sometimes see the following assertion fail:

x = MyInt(1)
y = MyInt(1)
assert x is not y # may fail, since int.__new__() might return the same object twice?

But in my experimentation with CPython I could not achieve such behavior. Does this mean the language provides "distinct object guarantee" for user-defined classes that don't override __new__, or is it just an arbitrary implementation behavior?

UPDATE 2:

While my DistinctTuple turned out to be a perfectly safe implementation, I now understand that my design idea of using DistinctTuple to model nodes is very bad.

The identity operator is already available in the language; making == behave in the same way as is is logically superfluous.

Worse, if == could have been done something useful, I made it unavailable. For instance, it's quite likely that somewhere in my program I'll want to see if two nodes are represented by the same pair of integers; == would have been perfect for that - and in fact, that's what it does by default...

Worse yet, most people actually do expect == to compare some "value" rather than identity - even for a user-defined class. They would be caught unawares with my override that only looks at identity.

Finally... the only reason I had to redefine == was to allow multiple nodes with the same tuple representation to be part of a set. This is the wrong way to go about it! It's not == behavior that needs to change, it's the container type! I simply needed to use multisets instead of sets.

In short, while my question may have some value for other situations, I am absolutely convinced that creating class DistinctTuple is a terrible idea for my use case (and I strongly suspect it has no valid use case at all).

Rumba answered 17/4, 2012 at 10:6 Comment(8)
You can't create user-defined immutable types AFAIK. In your example above, two DistinctTuple objects will always have different IDs.Whitson
@LuperRouch: you can, and you can even get the kind of guarantees that None and bool give, by overloading __new__, but the objects' immutability is not necessarily enforced by the implementation.Mcintosh
@larsmans: sure you can return a tuple in your __new__, but then you are not creating user-defined objects.Whitson
@LuperRouch: you can create a pool of MyTuple objects and return objects from it in MyTuple__new__.Mcintosh
@larsmans: I dont't think that makes Python consider these objects as immutable, as str and tuple are for example (nothing guarantees that the objects pool is unmodifiable).Whitson
And anyway max wants different objects, subclassing tuple without fancy tricks in __new__ guarantees to have a new ID for every instance (because user-defined types are mutable).Whitson
@LuperRouch: I think we have different notions of immutability. In my understanding of immutability, it's a contract between implementer and user, not something enforced by the language implementation; that's also the definition that collections.abc seems to use. Then still, there may be a way to implement "truly" immutable classes by writing extension modules in C or Cython.Mcintosh
@larsman: As far as Python goes, immutability, when it exists, is enforced by the language -- thus str, tuple, and int are immutable, while user-defined types (implemented in Python) are not. Put another way, there is no way to mutate a str object -- at least, not using Python code.Narwhal
A
3

Is there any chance that the creation of an object of a user-defined type results in a reuse of an existing object?

This will happen if, and only if, the user-defined type is explicitly designed to do that. With __new__() or some metaclass.

I'd like to know how I can verify for any class I work with whether it might exhibit such a behavior.

Use the source, Luke.

When it comes to int, small integers are pre-allocated, and these pre-allocated integers are used wherever you create of calculate with integers. You can't get this working when you do MyInt(1) is MyInt(1), because what you have there are not integers. However:

>>> MyInt(1) + MyInt(1) is 2
True

This is because of course MyInt(1) + MyInt(1) does not return a MyInt. It returns an int, because that's what the __add__ of an integer returns (and that's where the check for pre-allocated integers occur as well). This if anything just shows that subclassing int in general isn't particularly useful. :-)

Does this mean the language provides "distinct object guarantee" for user-defined classes that don't override new, or is it just an arbitrary implementation behavior?

It doesn't guarantee it, because there is no need to do so. The default behavior is to create a new object. You have to override it if you don't want that to happen. Having a guarantee makes no sense.

Afterthought answered 18/4, 2012 at 4:1 Comment(5)
Thanks. You say guarantee isn't needed; but is it really that obvious that MyInt(1) is not MyInt(1) by default? After all, MyInt relies on int.__new__() to allocate memory. For all I know, int.__new__() might always return the reference to the memory that contains 1 - regardless of whether I'm creating an int or MyInt.Rumba
No it's not obvious. You just don't need to care about it. As noted, there is no reason for you to subclass int, it doesn't make any sense. And it doesn't return references to the memory that contains 1, it returns references to the objects. If it behaves like you imply, then MyInt(1) wouldn't create a MyInt, it would return an int. Which it obviously does not.Afterthought
@Rumba user-defined types are mutable (try it by yourself: a = MyInt(); a.foo = "bar"; you can't do that on a plain int). This by itself (mutability) proves that you get a new object every time you call MyInt(), or the language would be a total mess.Whitson
@LennartRegebro: Thanks. So, the only way MyInt(1) is MyInt(1) would hold is if int.__new__ went crazy and cached not only int objects but, under some conditions, also objects of its subclasses. It's clearly not happening, but strictly speaking the language doesn't promise that it's not, right?Rumba
Yes, if it cached objects of it's subclasses or if it simply did not return MyInt() objects in the case above (which it of course does).Afterthought
M
4

Python reference, section 3, Data model:

for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.

(Emphasis added.)

In practice, it seems CPython only caches the empty tuple:

>>> 1 is 1
True
>>> (1,) is (1,)
False
>>> () is ()
True
Mcintosh answered 17/4, 2012 at 10:16 Comment(9)
Would a user-defined class derived from tuple be considered immutable for the purpose of this guarantee? Is there a cheap modification that I can do to make it mutable, i.e., to acquire the "distinct object guarantee"?Rumba
@max: I think a mutable tuple subtype would violate the LSP. You should really subclass list instead.Mcintosh
Well, I don't plan to ever modify it. I just want to get the guarantee that each time it's created, it will be a different object. I'm actually not sure whether class MyTuple(tuple): pass is mutable or not for the purpose of this guarantee.Rumba
@max: it seems like the default __new__ for MyTuple creates new objects, even though it derives from an immutable class, though I couldn't find a guarantee to this effect in the docs for __new__.Mcintosh
@max: you can set attributes on your objects (a = MyTuple(); a.foo = 1), so MyTuple objects are mutable, and each instance has a different ID.Whitson
@max: you can overload __new__ to get the guarantee you're after, but honestly, I'd look for a different way to tackle the problem you're trying to solve.Mcintosh
@LuperRouch Yes; but if I do that, I might as well give up on identity checks and simply add a unique id as an attribute.Rumba
@larsmans Overloading __new__ to ensure distinct tuples would be inefficient since I can't use tuple.__new__ (which may return the same object twice). I was hoping my DistinctTuple class is guaranteed to create distinct objects, but if it doesn't I'll indeed look for a different solution.Rumba
@larsmans I now agree that "a different way to tackle the problem" is well-advised. See my update 2 to the question.Rumba
A
3

Is there any chance that the creation of an object of a user-defined type results in a reuse of an existing object?

This will happen if, and only if, the user-defined type is explicitly designed to do that. With __new__() or some metaclass.

I'd like to know how I can verify for any class I work with whether it might exhibit such a behavior.

Use the source, Luke.

When it comes to int, small integers are pre-allocated, and these pre-allocated integers are used wherever you create of calculate with integers. You can't get this working when you do MyInt(1) is MyInt(1), because what you have there are not integers. However:

>>> MyInt(1) + MyInt(1) is 2
True

This is because of course MyInt(1) + MyInt(1) does not return a MyInt. It returns an int, because that's what the __add__ of an integer returns (and that's where the check for pre-allocated integers occur as well). This if anything just shows that subclassing int in general isn't particularly useful. :-)

Does this mean the language provides "distinct object guarantee" for user-defined classes that don't override new, or is it just an arbitrary implementation behavior?

It doesn't guarantee it, because there is no need to do so. The default behavior is to create a new object. You have to override it if you don't want that to happen. Having a guarantee makes no sense.

Afterthought answered 18/4, 2012 at 4:1 Comment(5)
Thanks. You say guarantee isn't needed; but is it really that obvious that MyInt(1) is not MyInt(1) by default? After all, MyInt relies on int.__new__() to allocate memory. For all I know, int.__new__() might always return the reference to the memory that contains 1 - regardless of whether I'm creating an int or MyInt.Rumba
No it's not obvious. You just don't need to care about it. As noted, there is no reason for you to subclass int, it doesn't make any sense. And it doesn't return references to the memory that contains 1, it returns references to the objects. If it behaves like you imply, then MyInt(1) wouldn't create a MyInt, it would return an int. Which it obviously does not.Afterthought
@Rumba user-defined types are mutable (try it by yourself: a = MyInt(); a.foo = "bar"; you can't do that on a plain int). This by itself (mutability) proves that you get a new object every time you call MyInt(), or the language would be a total mess.Whitson
@LennartRegebro: Thanks. So, the only way MyInt(1) is MyInt(1) would hold is if int.__new__ went crazy and cached not only int objects but, under some conditions, also objects of its subclasses. It's clearly not happening, but strictly speaking the language doesn't promise that it's not, right?Rumba
Yes, if it cached objects of it's subclasses or if it simply did not return MyInt() objects in the case above (which it of course does).Afterthought
N
1

If Python guarantees that tuples created at different times are distinct objects, I could simply subclass tuple to redefine equality to mean identity.

You seem to be confused about how subclassing works: If B subclasses A, then B gets to use all of A's methods[1] -- but the A methods will be working on instances of B, not of A. This holds even for __new__:

--> class Node(tuple):
...   def __new__(cls):
...     obj = tuple.__new__(cls)
...     print(type(obj))
...     return obj
...
--> n = Node()
<class '__main__.Node'>

As @larsman pointed out in the Python reference:

for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed

However, keep in mind this passage is talking about Python's built-in types, not user-defined types (which can go crazy pretty much any way they like).


I understand the above excerpt to guarantee that Python will not return a new mutable object that is the same as an existing object, and classes that are user-defined and created in Python code are inherently mutable (again, see note above about crazy user-defined classes).

A more complete Node class (note you don't need to explicitly refererence tuple.__hash__):

class Node(tuple):
    __slots__ = tuple()
    __hash__ = tuple.__hash__
    def __eq__(self, other):
        return self is other
    def __ne__(self, other):
        return self is not other

--> n1 = Node()
--> n2 = Node()
--> n1 is n2
False
--> n1 == n2
False
--> n1 != n2
True

--> n1 <= n2
True
--> n1 < n2
False

As you can see from the last two comparisons, you may want to also override the __le__ and __ge__ methods.

[1] The only exception I am aware of is __hash__ -- if __eq__ is defined on the subclass but the subclass wants the parent class' __hash__ it has to explicitly say so (this is a Python 3 change).

Narwhal answered 18/4, 2012 at 14:15 Comment(1)

© 2022 - 2024 — McMap. All rights reserved.