How to extend the pandas' Dataframe class with my own methods and functions
Asked Answered
G

6

6

First question:

I am working with pandas' DataFrames and I am frequently running the same routines as part of data pre-processing and other things. I'd like to write some of these routines as methods in a class called ExtendedDataframe that extends pandas.DataFrame. I don't know how to go about this. So far, I'm not writing any __init__ in my new class so that it's inherited from pandas.DataFrame:

import pandas
class ExtendedDataframe(pandas.DataFrame):
  def some_method(self):
    blahblah

This apparently enables me to create an instance of ExtendedDataframe by inheritance. But I'm usually loading data through something like pandas.read_csv which returns a classic DataFrame. How can I do to be able to load such csv data and at some point turn it into an ExtendedDataframe to use my own methods, on top of those provided on standard DataFrame? It's fine if the loading phase returns a standard DataFrame that I then transform into an ExtendedDataframe.

Second question:

Not all pandas' functionalities that I use are DataFrame methods. Some are functions, such as pandas.merge, that take DataFrames as arguments. How can I extend the use of such functions to instances of my ExtendedDataframe class? In otherwords, if df1 and df2 are two instances of ExtendedDataframe, how do I make

pandas.merge([df1, df2], ...)

work just like it would with standard instances of DataFrame?

Glyco answered 21/12, 2017 at 4:19 Comment(0)
B
9

This doesn't directly answer your question but it is a potential answer to your problem. Lot's of people use the pipe method in their workflows.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.html

Instead of saying

df = foo(df)

you can say

df = df.pipe(foo)

You can even specify arguments for the function! This will be much easier to maintain than trying to encapsulate the whole dataframe class. So the idea is that you can just create a library of functions and pipe them as needed.

Bosanquet answered 21/12, 2017 at 5:6 Comment(5)
That's indeed a possible solution, thanks! Is there otherwise a good general way to simply add methods/attributes to a built-in class?Glyco
The issue that you'll run into is that as soon as you invoke a built in method it will return a normal DataFrame not your custom one.Bosanquet
Ok so the only solution would be to modify the built-in class itself (which I don't think I want to do)?Glyco
You definitely don't want to do that. I mean saying pipe and then calling the function you want will be more sustainable in the long run. That's been my solution for the last year and I haven't found any problems with it.Bosanquet
With modern pandas (at least 2.2, don't know when it was introduced) you can do more. See pandas.pydata.org/docs/development/…. In particular, take note of the section on overriding constructor properties ("By overriding these properties, you can retain subclasses through pandas data manipulations.")Guck
S
8

Had the same problem today, with a colleagues help I found out that this works:

import pandas as pd

class MyDF(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(MyDF,  self).__init__(*args, **kwargs)

    @property
    def _constructor(self):
        return MyDF

    def my_custom_method(self):
        print('This actually works!')

Example:

df = MyDF(columns=('a', 'b'))
df = df.append({'a': 1, 'b': 'test'})
print(df)
df.my_custom_method()  # prints "This actually works!"
Susannesusceptibility answered 24/9, 2021 at 13:59 Comment(0)
B
3

I am not sure in which version of Pandas decorators for extending DataFrame, etc. were introduced. You can read more about it on the following address: https://pandas.pydata.org/pandas-docs/stable/development/extending.html

Bridgettbridgette answered 27/8, 2019 at 10:37 Comment(0)
T
0

When you create instance of your dataframe, they are DataFrame object. You can modify existing methods overriding them in this way ____existingMethod____ About the second questions, I would suggest you to create a new class in which you pass the 2 dataframes. In this case you will have to make the ____init____ method

Trilateral answered 21/12, 2017 at 5:10 Comment(0)
A
0

You could extend the constructor like this:

import pandas
from datetime import datetime

class ExtendedDataframe(pandas.DataFrame):
  def __init__(self, *args, **kwargs):
    pandas.DataFrame.__init__(self, *args, **kwargs)
    self.created_at = datetime.today()

  def to_csv(self, *args, **kwargs):
    copy = self.copy()
    copy["created_at"] = self.created_at
    pd.DataFrame.to_csv(copy, *args, **kwargs)
Apotheosize answered 20/11, 2021 at 15:34 Comment(0)
A
0

You can extend your DataFrame class by assigning a function to the DataFrame class variable.

For example:

def my_custom_class():
    print('hi')
pd.DataFrame.my_custom_class = my_custom_class
Armendariz answered 7/5, 2024 at 3:10 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.