Python - extending properties like you'd extend a function
Asked Answered
S

4

10

Question

How can you extend a python property?

A subclass can extend a super class's function by calling it in the overloaded version, and then operating on the result. Here's an example of what I mean when I say "extending a function":

# Extending a function (a tongue-in-cheek example)

class NormalMath(object):
    def __init__(self, number):
        self.number = number

    def add_pi(self):
        n = self.number
        return n + 3.1415


class NewMath(object):
    def add_pi(self):
        # NewMath doesn't know how NormalMath added pi (and shouldn't need to).
        # It just uses the result.
        n = NormalMath.add_pi(self)  

        # In NewMath, fractions are considered too hard for our users.
        # We therefore silently convert them to integers.
        return int(n)

Is there an analogous operation to extending functions, but for functions that use the property decorator?

I want to do some additional calculations immediately after getting an expensive-to-compute attribute. I need to keep the attribute's access lazy. I don't want the user to have to invoke a special routine to make the calculations. basically, I don't want the user to ever know the calculations were made in the first place. However, the attribute must remain a property, since i've got legacy code I need to support.

Maybe this is a job for decorators? If I'm not mistaken, decorator is a function that wraps another function, and I'm looking to wrap a property with some more calculations, and then present it as a property again, which seems like a similar idea... but I can't quite figure it out.

My Specific Problem

I've got a base class LogFile with an expensive-to-construct attribute .dataframe. I've implemented it as a property (with the property decorator), so it won't actually parse the log file until I ask for the dataframe. So far, it works great. I can construct a bunch (100+) LogFile objects, and use cheaper methods to filter and select only the important ones to parse. And whenever I'm using the same LogFile over and over, i only have to parse it the first time I access the dataframe.

Now I need to write a LogFile subclass, SensorLog, that adds some extra columns to the base class's dataframe attribute, but I can't quite figure out the syntax to call the super class's dataframe construction routines (without knowing anything about their internal workings), then operate on the resulting dataframe, and then cache/return it.

# Base Class - rules for parsing/interacting with data.
class LogFile(object):
    def __init__(self, file_name):
        # file name to find the log file
        self.file_name = file_name
        # non-public variable to cache results of parse()
        self._dataframe = None

    def parse(self):
        with open(self.file_name) as infile:
            ...
            ...
            # Complex rules to interpret the file 
            ...
            ...
        self._dataframe = pandas.DataFrame(stuff)

    @property
    def dataframe(self):
        """
        Returns the dataframe; parses file if necessary. This works great!

        """
        if self._dataframe is None:
            self.parse()
        return self._dataframe

    @dataframe.setter
    def dataframe(self,value):
        self._dataframe = value


# Sub class - adds more information to data, but does't parse
# must preserve established .dataframe interface
class SensorLog(LogFile):
    def __init__(self, file_name):
        # Call the super's constructor
        LogFile.__init__(self, file_name)

        # SensorLog doesn't actually know about (and doesn't rely on) the ._dataframe cache, so it overrides it just in case.
        self._dataframe = None

    # THIS IS THE PART I CAN'T FIGURE OUT
    # Here's my best guess, but it doesn't quite work:
    @property
    def dataframe(self):
        # use parent class's getter, invoking the hidden parse function and any other operations LogFile might do.
        self._dataframe = LogFile.dataframe.getter()    

        # Add additional calculated columns
        self._dataframe['extra_stuff'] = 'hello world!'
        return self._dataframe


    @dataframe.setter
    def dataframe(self, value):
        self._dataframe = value

Now, when these classes are used in an interactive session, the user should be able to interact with either in the same way.

>>> log = LogFile('data.csv')
>>> print log.dataframe
#### DataFrame with 10 columns goes here ####
>>> sensor = SensorLog('data.csv')
>>> print sensor.dataframe
#### DataFrame with 11 columns goes here ####

I have lots of existing code that takes a LogFile instance which provides a .dataframe attribute and dos something interesting (mostly plotting). I would LOVE to have SensorLog instances present the same interface so they can use the same code. Is it possible to extend the super-class's dataframe getter to take advantage of existing routines? How? Or am I better off doing this a different way?

Thanks for reading that huge wall of text. You are an internet super hero, dear reader. Got any ideas?

Steal answered 18/2, 2014 at 4:15 Comment(3)
Why don't you copy from the parent to the child and change it as per your needs?Inerrant
Did you see previous questions about similar issues here and here?Birecree
@Birecree -- thanks for the links. I looked, but couldn't find them. The first would have answered my question nicely.Steal
H
11

You should be calling the superclass properties, not bypassing them via self._dataframe. Here's a generic example:

class A(object):

    def __init__(self):
        self.__prop = None

    @property
    def prop(self):
        return self.__prop

    @prop.setter
    def prop(self, value):
        self.__prop = value

class B(A):

    def __init__(self):
        super(B, self).__init__()

    @property
    def prop(self):
        value = A.prop.fget(self)
        value['extra'] = 'stuff'
        return value

    @prop.setter
    def prop(self, value):
        A.prop.fset(self, value)

And using it:

b = B()
b.prop = dict((('a', 1), ('b', 2)))
print(b.prop)

Outputs:

{'a': 1, 'b': 2, 'extra': 'stuff'}

I would generally recommend placing side-effects in setters instead of getters, like this:

class A(object):

    def __init__(self):
        self.__prop = None

    @property
    def prop(self):
        return self.__prop

    @prop.setter
    def prop(self, value):
        self.__prop = value

class B(A):

    def __init__(self):
        super(B, self).__init__()

    @property
    def prop(self):
        return A.prop.fget(self)

    @prop.setter
    def prop(self, value):
        value['extra'] = 'stuff'
        A.prop.fset(self, value)

Having costly operations within a getter is also generally to be avoided (such as your parse method).

Horologist answered 18/2, 2014 at 6:48 Comment(5)
This is exactly what I was looking for. Specifically, the snippet "A.prop.fget(self)". I confess that feel a little dirty putting expensive operations like parse() in the property getter, but I'm not sure where else to stick it. I don't want the users to have to explicitly call parse(), and I'm definitely not putting anything expensive in __init__(). The LogFile.dataframe is rarely set explicitly by the user (though it can be). What is a better way?Steal
Because the syntax of properties hides the fact that additional code is being executed, someone using your API can accidentally incur substantial performance hits without realizing it. While someone might not think that repeatedly calling b.prop in a loop would incur much overhead, if b.prop takes a long time to call then caching b.prop in a temporary variable outside the loop may be a huge performance boost.Horologist
In your case, your parse method should only run once, so it's not quite as bad. And if you really need to lazy load the file, then that's where you should do it. However, by placing the load in your constructor, which takes a filename as an argument, it will be much more apparent in your API where the performance hits will be. This will make it easier for the larger application to be smart about managing the performance.Horologist
I think an ideal scenario would be for __init__ to call parse, parse to call the setter, and the setter to add the extra fields. The way you currently have it set up, your getter will repeatedly set the extra fields which have already been set. If you need lazy file loading, then you can call parse from the getter, but parse should then still call into the setter, where you do your field modification.Horologist
What about if A is an abstract class and the property prop is only an abstract definition?Consentaneous
B
1

If I understand correctly what you want to do is call the parent's method from the child instance. The usual way to do that is by using the super built-in.

I've taken your tongue-in-cheek example and modified it to use super in order to show you:

class NormalMath(object):
    def __init__(self, number):
        self.number = number

    def add_pi(self):
        n = self.number
        return n + 3.1415


class NewMath(NormalMath):
    def add_pi(self):
        # this will call NormalMath's add_pi with
        normal_maths_pi_plus_num = super(NewMath, self).add_pi()
        return int(normal_maths_pi_plus_num)

In your Log example, instead of calling:

self._dataframe = LogFile.dataframe.getter() 

you should call:

self._dataframe = super(SensorLog, self).dataframe

You can read more about super here

Edit: Even thought the example I gave you deals with methods, to do the same with @properties shouldn't be a problem.

Baskerville answered 18/2, 2014 at 4:45 Comment(1)
Interesting! I haven't used the super() call at all yet. I'll have to do some reading.Steal
A
1

You have some possibilities to consider:

1/ Inherit from logfile and override parse in your derived sensor class. It should be possible to modify your methods that work on dataframe to work regardless of the number of members that dataframe has - as you are using pandas a lot of it is done for you.

2/ Make sensor an instance of logfile then provide its own parse method.

3/ Generalise parse, and possibly some of your other methods, to use a list of data descriptors and possibly a dictionary of methods/rules either set in your class initialiser or set by a methods.

4/ Look at either making more use of the methods already in pandas, or possibly, extending pandas to provide the missing methods if you and others think that they would be accepted into pandas as useful extensions.

Personally I think that you would find the benefits of options 3 or 4 to be the most powerful.

Aggravate answered 18/2, 2014 at 6:12 Comment(1)
Pandas is awesome. Wes McKinney has done 99% of the hard work for me. All I have to do is figure out how to get my data into DataFrame objects, and then call the native .join() and .align() methods in the right way. Overriding parse() in the Sensor object could work -- I'll look into that. It then becomes a question of when I should parse individual dataframes, when I should align/join them, and when I should make additional calculations on them.Steal
U
0

The problem is that you're missing a self going into the parent class. If your parent is a singleton then a @staticmethod should work.

class X():
    x=1
    @staticmethod
    def getx():
        return X.x

class Y(X):
    y=2
    def getyx(self):
        return X.getx()+self.y

wx = Y()
wx.getyx()
3
Unholy answered 18/2, 2014 at 5:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.