Pandas: Resample dataframe column, get discrete feature that corresponds to max value

Asked 27/6, 2017 at 20:56 Answered 17/5, 2022 at 20:1

Sample data:

import pandas as pd
import numpy as np
import datetime

data = {'value': [1,2,4,3], 'names': ['joe', 'bob', 'joe', 'bob']}
start, end = datetime.datetime(2015, 1, 1), datetime.datetime(2015, 1, 4)
test = pd.DataFrame(data=data, index=pd.DatetimeIndex(start=start, end=end, 
       freq="D"), columns=["value", "names"])

gives:

          value names
2015-01-01  1   joe
2015-01-02  2   bob
2015-01-03  4   joe
2015-01-04  3   bob

I want to resample by '2D' and get the max value, something like:

df.resample('2D')

The expected result should be:

          value names
 2015-01-01 2   bob
 2015-01-03 4   joe

Can anyone help me?

Micrometry answered 27/6, 2017 at 20:56 Comment(1)

I've updated my answer if you're interested. – Landa 27/6, 2017 at 22:13

You can resample to get the arg max of value and then use it to extract names and value

(df.resample('2D')[['value']].idxmax()
   .assign(names=lambda x: df.loc[x.value]['names'].values,
           value=lambda x: df.loc[x.value]['value'].values)
)
Out[116]: 
            value names
2015-01-01      2   bob
2015-01-03      4   joe

Katrinakatrine answered 27/6, 2017 at 21:3 Comment(1)

Super solution. This also extends to data with the same dates. – Micrometry 27/6, 2017 at 21:15

Use apply and return the row with maximal value. It will get labeled via the resample

test.resample('2D').apply(lambda df: df.loc[df.value.idxmax()])

            value names
2015-01-01      2   bob
2015-01-03      4   joe

Landa answered 27/6, 2017 at 21:9 Comment(4)

As I said to ayhan :-), this doesn't give the index that the OP is expecting. There might be a slick way to do it in one line, but I think you could just name the idxmax() result something and then set_index(ii.index) to patch it. – Virgule 27/6, 2017 at 21:11

@Virgule I did the same thing inside an apply. This way the indices are handled by the resample but I get the rows I want. Thanks for letting me know, I was in a meeting and couldn't respond right away (-: – Landa 27/6, 2017 at 22:10

This gets AttributeError: 'Series' object has no attribute 'value' on pandas v1.1.2. – Suzette 25/9, 2020 at 16:57

value in this context was a column designated by the OP. It could have been written as test.resample('2D').apply(lambda df: df.loc[df['value']idxmax()]) to make it clearer. – Landa 25/9, 2020 at 17:0

The idxmax works well unless there are missing values in the dates. For example, if you resample every day, and one day has no values, instead of returning Nan, idxmax will raise an error.

The following is how to overcome the problems

def map_resample_columns(original_df, resample_df, key_col, cols):
    """
    The function will add the col back to resampled_df
    input: resample_df is resampled from original df based on key_col
    cols: list of columns from original_df to be added back to resample_df    
    """
    for col in cols:
        record_info = []
        for idx, row in resample_df.iterrows():
            val = row[key_col]
            if not np.isnan(val):
                record_info.append(original_df[original_df[key_col] == val][col].tolist()[0])
            else:
                record_info.append(np.nan)
        resample_df[col] = record_info
    return resample_df

Job answered 17/5, 2022 at 20:1 Comment(0)

Recommended topics

Hot tags