pandas: Composition for chained methods like .resample(), .rolling() etc
Asked Answered
C

2

6

I would like to construct an extension of pandas.DataFrame — let's call it SPDF — which could do stuff above and beyond what a simple DataFrame can:

import pandas as pd
import numpy as np


def to_spdf(func):
    """Transform generic output of `func` to SPDF.

    Returns
    -------
    wrapper : callable
    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        return SPDF(res)

    return wrapper


class SPDF:
    """Special-purpose dataframe.

    Parameters
    ----------
    df : pandas.DataFrame

    """

    def __init__(self, df):
        self.df = df

    def __repr__(self):
        return repr(self.df)

    def __getattr__(self, item):
        res = getattr(self.df, item)

        if callable(res):
            res = to_spdf(res)

        return res


if __name__ == "__main__":

    # construct a generic SPDF
    df = pd.DataFrame(np.eye(4))
    an_spdf = SPDF(df)

    # call .diff() to obtain another SPDF
    print(an_spdf.diff())

Right now, methods of DataFrame that return another DataFrame, such as .diff() in the MWE above, return me another SPDF, which is great. However, I would also like to trick chained methods such as .resample('M').last() or .rolling(2).mean() into producing an SPDF in the very end. I have failed so far because .rolling() and the like are of type callable, and my wrapper to_spdf tries to construct an SPDF from their output without 'waiting' for .mean() or any other last part of the expression. Any ideas how to tackle this problem?

Thanks.

Cowitch answered 11/7, 2018 at 7:29 Comment(3)
I am missing a purpose of SPDF. What it will give you a regular DataFrame is incapable of?Dukey
Could you show how do you arrive to your problem? Starting from your MWE, I'm cheking whether chaining methods returns an SPDF "in the very end" and getting the expected result (i.e. isinstance(an_spdf.rolling(2).mean(), SPDF) returns True)Bullate
@TomasFarias you are right! I simplified a bit and did not notice the provided MWE actually works.Cowitch
S
4

You should be properly subclassing dataframe. In order to get copy-constructor methods to work, pandas describes that you must set the _constructor property (along with other information).

You could do something like the following:

class SPDF(DataFrame):

    @property
    def _constructor(self):
        return SPDF

If you need to preserve custom attributes (not functions - those will be there), during copy-constructor methods (like diff), then you can do something like the following

class SPDF(DataFrame):
    _metadata = ['prop']
    prop = 1

    @property
    def _constructor(self):
        return SPDF

Notice the output is as desired:

df = SPDF(np.eye(4))
print(type(df))
[<class '__main__.SPDF'>]
new = df.diff()
print(type(new))
[<class '__main__.SPDF'>]
Stamata answered 17/7, 2018 at 18:11 Comment(1)
+1 for pointing to _constructor. Still not perfect, but will probably become the way to go in the future.Cowitch
M
0

If you do not want to subclass DataFrame, you can introduce another class like PendingSPDF and wrap non-dataframe objects with it:

import pandas as pd
import numpy as np


def to_spdf(func):
    """Transform generic output of `func` to SPDF.

    Returns
    -------
    wrapper : callable
    """
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, pd.DataFrame):
            return SPDF(res)
        else:
            return PendingSPDF(res)

    return wrapper

class SPDF:
    """Special-purpose dataframe.

    Parameters
    ----------
    df : pandas.DataFrame

    """

    def __init__(self, df):
        self.df = df

    def __repr__(self):
        return repr(self.df)

    def __getattr__(self, item):
        res = getattr(self.df, item)

        if callable(res):
            res = to_spdf(res)

        return res

class PendingSPDF:
    def __init__(self, df):
        self.df = df

    def __getattr__(self, item):
        res = getattr(self.df, item)

        if callable(res):
            res = to_spdf(res)

        return res

if __name__ == "__main__":

    # construct a generic SPDF
    df = pd.DataFrame(np.eye(4))
    an_spdf = SPDF(df)

    # call .diff() to obtain another SPDF
    print(an_spdf.diff())
Municipality answered 20/7, 2018 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.