Why attribute lookup in Python is designed this way (precedence chain)?
Asked Answered
S

3

14

I've just ran into the descriptors in Python, and I got the ideas about the descriptor protocol on "__get__, __set__, __delete__", and it really did a great job on wrapping methods.

However, in the protocol, there're other rules:

Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence.

I don't get the point, isn't it ok just to look up in the classic way(instance dictionary -> class dictionary -> base class dictionary)?
And if implement this way, data descriptors can be hold by instances, and the descriptor itself do not have to hold a weakrefdict to hold values for different instances of the owner.
Why put descriptors into the lookup chain? And why the data descriptor is placed in the very beginning?

Spongy answered 27/1, 2016 at 7:28 Comment(1)
After reviewing all the answers and communicating with the helpers, i got the logic here: descriptor is designed for OOP, it's for describing behaviors of properties(both data-properties and methods) of class, and that's why this protocol is called 'descriptor'. So descriptor protocol works in the class level. And for data descriptors, they describe the data-properties, to make this happen, the lookup precedency chain has to be in this way. So, this descriptor protocol is all about OOP. This logic make a lot sense to me. Thanks to skyking, Nikita and all the other helpers.Spongy
I
4

Let's see on an example:

class GetSetDesc(object):
    def __init__(self, value):
        self.value=value

    def __get__(self, obj, objtype):
        print("get_set_desc: Get")
        return self.value

    def __set__(self, obj, value):
        print("get_set_desc: Set")
        self.value=value

class SetDesc(object):
    def __init__(self, value):
        self.value=value

    def __set__(self, obj, value):
        print("set_desc: Set")
        self.value=value

class GetDesc(object):
    def __init__(self, value):
        self.value=value

    def __get__(self, obj, objtype):
        print("get_desc: Get")
        return self.value

class Test1(object):
    attr=10
    get_set_attr=10
    get_set_attr=GetSetDesc(5)
    set_attr=10
    set_attr=SetDesc(5)
    get_attr=10
    get_attr=GetDesc(5)

class Test2(object):
    def __init__(self):
        self.attr=10
        self.get_set_attr=10
        self.get_set_attr=GetSetDesc(5)
        self.set_attr=10
        self.set_attr=SetDesc(5)
        self.get_attr=10
        self.get_attr=GetDesc(5)

class Test3(Test1):
    def __init__(self):
        #changing values to see differce with superclass
        self.attr=100
        self.get_set_attr=100
        self.get_set_attr=GetSetDesc(50)
        self.set_attr=100
        self.set_attr=SetDesc(50)
        self.get_attr=100
        self.get_attr=GetDesc(50)

class Test4(Test1):
    pass


print("++Test 1 Start++")
t=Test1()

print("t.attr:", t.attr)
print("t.get_set_desc:", t.get_set_attr)
print("t.set_attr:", t.set_attr)
print("t.get_attr:", t.get_attr)

print("Class dict attr:", t.__class__.__dict__['attr'])
print("Class dict get_set_attr:", t.__class__.__dict__['get_set_attr'])
print("Class dict set_attr:", t.__class__.__dict__['set_attr'])
print("Class dict get_attr:", t.__class__.__dict__['get_attr'])

#These will obviously fail as instance dict is empty here
#print("Instance dict attr:", t.__dict__['attr'])
#print("Instance dict get_set_attr:", t.__dict__['get_set_attr'])
#print("Instance dict set_attr:", t.__dict__['set_attr'])
#print("Instance dict get_attr:", t.__dict__['get_attr'])

t.attr=20
t.get_set_attr=20
t.set_attr=20
t.get_attr=20

print("t.attr:", t.attr)
print("t.get_set_desc:", t.get_set_attr)
print("t.set_attr:", t.set_attr)
print("t.get_attr:", t.get_attr)

print("Class dict attr:", t.__class__.__dict__['attr'])
print("Class dict get_set_attr:", t.__class__.__dict__['get_set_attr'])
print("Class dict set_attr:", t.__class__.__dict__['set_attr'])
print("Class dict get_attr:", t.__class__.__dict__['get_attr'])

print("Instance dict attr:", t.__dict__['attr'])
#Next two will fail,
#because the descriptor for those variables has __set__
#on the class itself which was called with value 20,
#so the instance is not affected
#print("Instance dict get_set_attr:", t.__dict__['get_set_attr'])
#print("Instance dict set_attr:", t.__dict__['set_attr'])
print("Instance dict get_attr:", t.__dict__['get_attr'])

print("++Test 1 End++")


print("++Test 2 Start++")
t2=Test2()

print("t.attr:", t2.attr)
print("t.get_set_desc:", t2.get_set_attr)
print("t.set_attr:", t2.set_attr)
print("t.get_attr:", t2.get_attr)

#In this test the class is not affected, so these will fail
#print("Class dict attr:", t2.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t2.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t2.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t2.__class__.__dict__['get_attr'])

print("Instance dict attr:", t2.__dict__['attr'])
print("Instance dict get_set_attr:", t2.__dict__['get_set_attr'])
print("Instance dict set_attr:", t2.__dict__['set_attr'])
print("Instance dict get_attr:", t2.__dict__['get_attr'])

t2.attr=20
t2.get_set_attr=20
t2.set_attr=20
t2.get_attr=20

print("t.attr:", t2.attr)
print("t.get_set_desc:", t2.get_set_attr)
print("t.set_attr:", t2.set_attr)
print("t.get_attr:", t2.get_attr)

#In this test the class is not affected, so these will fail
#print("Class dict attr:", t2.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t2.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t2.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t2.__class__.__dict__['get_attr'])

print("Instance dict attr:", t2.__dict__['attr'])
print("Instance dict get_set_attr:", t2.__dict__['get_set_attr'])
print("Instance dict set_attr:", t2.__dict__['set_attr'])
print("Instance dict get_attr:", t2.__dict__['get_attr'])

print("++Test 2 End++")


print("++Test 3 Start++")
t3=Test3()

print("t.attr:", t3.attr)
print("t.get_set_desc:", t3.get_set_attr)
print("t.set_attr:", t3.set_attr)
print("t.get_attr:", t3.get_attr)

#These fail, because nothing is defined on Test3 class itself, but let's see its super below
#print("Class dict attr:", t3.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t3.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t3.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t3.__class__.__dict__['get_attr'])

#Checking superclass
print("Superclass dict attr:", t3.__class__.__bases__[0].__dict__['attr'])
print("Superclass dict get_set_attr:", t3.__class__.__bases__[0].__dict__['get_set_attr'])
print("Superclass dict set_attr:", t3.__class__.__bases__[0].__dict__['set_attr'])
print("Superclass dict get_attr:", t3.__class__.__bases__[0].__dict__['get_attr'])

print("Instance dict attr:", t3.__dict__['attr'])
#Next two with __set__ inside descriptor fail, because
#when the instance was created, the value inside the descriptor in superclass
#was redefined via __set__
#print("Instance dict get_set_attr:", t3.__dict__['get_set_attr'])
#print("Instance dict set_attr:", t3.__dict__['set_attr'])
print("Instance dict get_attr:", t3.__dict__['get_attr'])
#The one above does not fail, because it doesn't have __set__ in
#descriptor in superclass and therefore was redefined on instance

t3.attr=200
t3.get_set_attr=200
t3.set_attr=200
t3.get_attr=200

print("t.attr:", t3.attr)
print("t.get_set_desc:", t3.get_set_attr)
print("t.set_attr:", t3.set_attr)
print("t.get_attr:", t3.get_attr)

#print("Class dict attr:", t3.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t3.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t3.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t3.__class__.__dict__['get_attr'])

#Checking superclass
print("Superclass dict attr:", t3.__class__.__bases__[0].__dict__['attr'])
print("Superclass dict get_set_attr:", t3.__class__.__bases__[0].__dict__['get_set_attr'])
print("Superclass dict set_attr:", t3.__class__.__bases__[0].__dict__['set_attr'])
print("Superclass dict get_attr:", t3.__class__.__bases__[0].__dict__['get_attr'])

print("Instance dict attr:", t3.__dict__['attr'])
#Next two fail, they are in superclass, not in instance
#print("Instance dict get_set_attr:", t3.__dict__['get_set_attr'])
#print("Instance dict set_attr:", t3.__dict__['set_attr'])
print("Instance dict get_attr:", t3.__dict__['get_attr'])
#The one above succeds as it was redefined as stated in prior check

print("++Test 3 End++")


print("++Test 4 Start++")
t4=Test4()

print("t.attr:", t4.attr)
print("t.get_set_desc:", t4.get_set_attr)
print("t.set_attr:", t4.set_attr)
print("t.get_attr:", t4.get_attr)

#These again fail, as everything defined in superclass, not the class itself
#print("Class dict attr:", t4.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t4.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t4.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t4.__class__.__dict__['get_attr'])

#Checking superclass
print("Superclass dict attr:", t4.__class__.__bases__[0].__dict__['attr'])
print("Superclass dict get_set_attr:", t4.__class__.__bases__[0].__dict__['get_set_attr'])
print("Superclass dict set_attr:", t4.__class__.__bases__[0].__dict__['set_attr'])
print("Superclass dict get_attr:", t4.__class__.__bases__[0].__dict__['get_attr'])

#Again, everything is on superclass, not the instance
#print("Instance dict attr:", t4.__dict__['attr'])
#print("Instance dict get_set_attr:", t4.__dict__['get_set_attr'])
#print("Instance dict set_attr:", t4.__dict__['set_attr'])
#print("Instance dict get_attr:", t4.__dict__['get_attr'])

t4.attr=200
t4.get_set_attr=200
t4.set_attr=200
t4.get_attr=200

print("t.attr:", t4.attr)
print("t.get_set_desc:", t4.get_set_attr)
print("t.set_attr:", t4.set_attr)
print("t.get_attr:", t4.get_attr)

#Class is not affected by those assignments, next four fail
#print("Class dict attr:", t4.__class__.__dict__['attr'])
#print("Class dict get_set_attr:", t4.__class__.__dict__['get_set_attr'])
#print("Class dict set_attr:", t4.__class__.__dict__['set_attr'])
#print("Class dict get_attr:", t4.__class__.__dict__['get_attr'])

#Checking superclass
print("Superclass dict attr:", t4.__class__.__bases__[0].__dict__['attr'])
print("Superclass dict get_set_attr:", t4.__class__.__bases__[0].__dict__['get_set_attr'])
print("Superclass dict set_attr:", t4.__class__.__bases__[0].__dict__['set_attr'])
print("Superclass dict get_attr:", t4.__class__.__bases__[0].__dict__['get_attr'])

#Now, this one we redefined it succeeds
print("Instance dict attr:", t4.__dict__['attr'])
#This one fails it's still on superclass
#print("Instance dict get_set_attr:", t4.__dict__['get_set_attr'])
#Same here - fails, it's on superclass, because it has __set__
#print("Instance dict set_attr:", t4.__dict__['set_attr'])
#This one succeeds, no __set__ to call, so it was redefined on instance
print("Instance dict get_attr:", t4.__dict__['get_attr'])

print("++Test 4 End++")

The output:

++Test 1 Start++
t.attr: 10
get_set_desc: Get
t.get_set_desc: 5
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
get_desc: Get
t.get_attr: 5
Class dict attr: 10
Class dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Class dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Class dict get_attr: <__main__.GetDesc object at 0x02896EF0>
get_set_desc: Set
set_desc: Set
t.attr: 20
get_set_desc: Get
t.get_set_desc: 20
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
t.get_attr: 20
Class dict attr: 10
Class dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Class dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Class dict get_attr: <__main__.GetDesc object at 0x02896EF0>
Instance dict attr: 20
Instance dict get_attr: 20
++Test 1 End++
++Test 2 Start++
t.attr: 10
t.get_set_desc: <__main__.GetSetDesc object at 0x028A0350>
t.set_attr: <__main__.SetDesc object at 0x028A0370>
t.get_attr: <__main__.GetDesc object at 0x028A0330>
Instance dict attr: 10
Instance dict get_set_attr: <__main__.GetSetDesc object at 0x028A0350>
Instance dict set_attr: <__main__.SetDesc object at 0x028A0370>
Instance dict get_attr: <__main__.GetDesc object at 0x028A0330>
t.attr: 20
t.get_set_desc: 20
t.set_attr: 20
t.get_attr: 20
Instance dict attr: 20
Instance dict get_set_attr: 20
Instance dict set_attr: 20
Instance dict get_attr: 20
++Test 2 End++
++Test 3 Start++
get_set_desc: Set
get_set_desc: Set
set_desc: Set
set_desc: Set
t.attr: 100
get_set_desc: Get
t.get_set_desc: <__main__.GetSetDesc object at 0x02896FF0>
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
t.get_attr: <__main__.GetDesc object at 0x028A03F0>
Superclass dict attr: 10
Superclass dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Superclass dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Superclass dict get_attr: <__main__.GetDesc object at 0x02896EF0>
Instance dict attr: 100
Instance dict get_attr: <__main__.GetDesc object at 0x028A03F0>
get_set_desc: Set
set_desc: Set
t.attr: 200
get_set_desc: Get
t.get_set_desc: 200
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
t.get_attr: 200
Superclass dict attr: 10
Superclass dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Superclass dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Superclass dict get_attr: <__main__.GetDesc object at 0x02896EF0>
Instance dict attr: 200
Instance dict get_attr: 200
++Test 3 End++
++Test 4 Start++
t.attr: 10
get_set_desc: Get
t.get_set_desc: 200
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
get_desc: Get
t.get_attr: 5
Superclass dict attr: 10
Superclass dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Superclass dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Superclass dict get_attr: <__main__.GetDesc object at 0x02896EF0>
get_set_desc: Set
set_desc: Set
t.attr: 200
get_set_desc: Get
t.get_set_desc: 200
t.set_attr: <__main__.SetDesc object at 0x02896ED0>
t.get_attr: 200
Superclass dict attr: 10
Superclass dict get_set_attr: <__main__.GetSetDesc object at 0x02896EB0>
Superclass dict set_attr: <__main__.SetDesc object at 0x02896ED0>
Superclass dict get_attr: <__main__.GetDesc object at 0x02896EF0>
Instance dict attr: 200
Instance dict get_attr: 200
++Test 4 End++

Try it yourself to get a taste on descriptors. But bottomline, what we see here is...

First, definition from official docs to refresh the memory:

If an object defines both __get__() and __set__(), it is considered a data descriptor. Descriptors that only define __get__() are called non-data descriptors (they are typically used for methods but other uses are possible).

From the output and failed snippets...

It's clear that before the name referencing the descriptor(any type) is reassigned, the descriptor is looked up as usual following MRO from the class level to superclasses to the place where it was defined. (See Test 2, where it's defined in the instance and doesn't get called, but gets redefined with simple value.)

Now when the name is reassigned, things start to be interesting:

If it's a data descriptor (has __set__), then really no magic happens and the value assigned to the variable referencing the descriptor is passes to descriptors's __set__ and is used inside this method (regarding the code above it's assigned to self.value). The descriptor is first looked up in hierarchy ofc. Btw, the descriptor without __get__ is returned itself, not the value used with its __set__ method.

If it's a non-data descriptor (has only __get__), then it's looked up, but having no __set__ method it's "droped", and the variable referencing this descriptor gets reassigned at lowest possible level (instance or subclass, depending on where we define it).

So descriptors are used to control, modify, the data assigned to variables, which are made descriptors. So this makes sence, if a descriptor is a data descriptor, which defines __set__, it probably wants to parse the data you pass and hence gets called prior to instance dictionary key assignment. That's why it's put in hierarchy at first place. On the other hand, if it's a non-data descriptor with only __get__, it probably doesn't care for setting the data, and even more - it can't do anything to the data beign set, so it falls off from the chain on assignment and the data gets assigned to instance dictionary key.

Also, new style classes are all about MRO (Method Resolution Order), so it affects every feature - descriptor, properties (which are in fact descriptors too), special methods, etc. Descriptors are basicly methods, that get called on assignment or attribute read, so it makes sence, that they are looked up at class level as any other method is expected to.

If you need to control assignment, but refuse any change to variable use a data descriptor, but raise and exception in its __set__ method.

Interatomic answered 27/1, 2016 at 10:49 Comment(6)
If implement this way(for both access and assignment): check the instance dictionary(evaluate __get/set__() if defined, else itself) -> check the class dictionary(evaluate __get/set__() if defined, else itself) -> check the base classes dictionaries(evaluate __get/set__() if defined, else itself). To me, this is quite easy to understand, however Python didn't do this way, why? There must be some reason i think, and i want to know it :-)Spongy
First, see above about MRO. Second, it's because of OOP... Conceptually: behavior is commonly defined at class level, because it's the same for all objects of the class, i.e. all cars move; on the other hand, properties (don't mean @property here) belong to particular objects of some class - instances, i.e. car A is red, car B is green. If you need to redefine behavior you subclass and do it at the class level, not redefine it for every instace of the class. (Ofc there're exceptions, but general OOP concept is as above). Further see next comment...Interatomic
And, now, descriptors redefine behavior - they manage/perform operations conserning object's properties - that's why they are defined at class level. That's all. Short answer - it's all because of established OOP concepts.Interatomic
the design logic is, this protocol is for describing behaviors, and in OOP, behaviors are for class. So this protocol should work in the class level, and data descriptor describes the property, so the attribute lookup precedence chain has to be in this way. Right? Thinking in this way, all the things make sense.Spongy
The justification for having descriptors is written above and looks like you got it. The justification for descriptors havin precedence comes from its role, see third from the bottom paragraph in my answer.Interatomic
No problem, glad I was able to help.Interatomic
S
2

First of all the classic way (actually it hasn't changed that much) is not what you describe. There is in reality not a base class in that sense, base classes are only something that is used during class creation. The classic lookup is first looking in the instance, and then in the class.

The reason one introduces descriptors is to allow a cleaner way to customize attribute access. The classic way relied on there being lookupable functions to set and get attributes. The new way also allows for defining properties using the @property decorator.

Now for the reason one distinguish data and non-data (or RW and RO) descriptors. First one should note that it's reasonable to do the same lookup regardless of what type of access you're attempting (whether its read, write or delete):

The reason the descriptor should take precedence with RO-descriptors is that if you have a RO descriptor your intention is normally that the attribute should be read only. This means that using the descriptor is proper in this case.

On the other side if you have a RW-descriptor it would be useful to use the __dict__ entry to store the actual data.

One should also note that a descriptor is properly placed in the class and not in the instance (and having the attribute lookup automatically call __get__ if it finds an object with that method).

Why it's not the other way is because if you place a descriptor in an instance you might want that attribute to actually refer to a descriptor and not what the descriptor would make you think it is (by calling __get__ on it). For example:

class D:
    def __get__(self):
        return None

class C:
    pass

o = C()
d = D()

o.fubar = d

Now the last statement could be that we actually stored D() in o.fubar for the purpose that o.fubar to return d instead of calling d.__get__() which would return None.

Spancel answered 27/1, 2016 at 8:44 Comment(10)
i don't really get your point. What confusing me now is: why b.x is converted into type(b).__dict__['x'].__get__(b, type(b)), not b.__dict__['x'].__get__(b, type(b)). And why all descriptors are almost used in the class level, not the instance level?Spongy
@Spongy That's because if a descriptor is found in the __dict__ of the instance it means that the corresponding attribute actually refers to a descriptor and not what you'd get if you call __get__. Therefore if you want the descriptor to be automatically called it has to be in the class.Spancel
yeah, i know that, but i'm wondering why it's designed that way. Why not just follow the old lookup way, and when an attribute is found, then check if it have __get__(), if it does then execute it, else just return the attribute. And in the CPython implementation, it first check the class.__dict__['x'], and because of this, we shall put descriptor in the class. So, the real question is, why designed this way, not the way i described which is natural i think.Spongy
@Spongy That's because that would prohibit the possibility to store and successfully retrieve a descriptor in an instance - in that case you would get __get__ called on it instead of returning the descriptor.Spancel
thanks, but, why "prohibit the possibility to store and successfully retrieve a descriptor in an instance"? If implement this way: check the instance dictionary(evaluate __get__() if defined, else itself) -> check the class dictionary(evaluate __get__() if defined, else itself) -> check the base classes dictionaries(evaluate __get__() if defined, else itself). To me, this is quite easy to understand, however Python didn't do this way, why? There must be some reason i think, and i want to know it :-)Spongy
@Spongy That way you would not be able to successfully retrieve the actual descriptor. Instead you would get back whatever __get__ happens to return. This would imply a restriction on what you may store behind instance attributes.Spancel
i don't really get the point of " This would imply a restriction on what you may store behind instance attributes."Spongy
@Spongy The restriction is that you can't store a descriptor itself. Assume that you have a descriptor d and write a.d = d - then you wouldn't be able to retrieve the descriptor by writing a.d because that would result in d.__get__ being called instead (of just retrieving the descriptor). There's simply no way to distinguish the case where you stored a descriptor at a.d for the purpose of getting the descriptor back and you stored a descriptor at a.d for the purpose of getting a.d.__get__ to be called.Spancel
but in current implementation, you still cannot get the descriptor if the it is placed in the class dictionary.Spongy
Let us continue this discussion in chat.Spancel
P
1

The issue is one of overloading. Let's imagine you have a Descriptor class, and you set one attribute of your object to an instance of that class:

class Descriptor:
    ...
    def __get__(self, parent, type=None):
       return 1

class MyObject:
    def __init__(self):
        self.foo = Descriptor()

mobj = MyObject()

In this case, you have a non-data descriptor. Any code that accesses mobj.foo will get a result of 1, due to the getter.

But suppose you try to store to that attribute? What happens?

Answer: a simple entry will be added to the instance dictionary, and mobj.foo will point to whatever value was stored.

In this case, if you subsequently read from mobj.foo, which value do you get back? The '1' returned by the get function, or the recently-stored "live" value that is listed in the dictionary?

Right! In cases where a conflict appears, the descriptor silently fades away, and you are left with retrieving whatever it was that you stored.

Parhelion answered 27/1, 2016 at 8:37 Comment(3)
in this case, you should define a 'set' in the descriptor to avoid inserting a new entry to the isntance dictionary.Spongy
Actually, if the code accesses mobj.foo it would get the actual descriptor (and also if you assign to mobj.foo the descriptor would be replaced).Spancel
@Spancel yes, my mistake, as i replied in your answer, i'm not confusing about the way how to use descriptors, but why the lookup chain designed this way which makes descriptors should always put in class not instance. My question could be silly, thanks for your patience.Spongy

© 2022 - 2024 — McMap. All rights reserved.