Python: mean() doesn't work when groupby aggregates dataframe to one line
Asked Answered
B

1

6

I have dataframe:

time_to_rent = {'rentId': {0: 43.0, 1: 87.0, 2: 140.0, 3: 454.0, 4: 1458.0}, 'creditCardId': {0: 40, 1: 40, 2: 40, 3: 40, 4: 40}, 'createdAt': {0: Timestamp('2020-08-24 16:13:11.850216'), 1: Timestamp('2020-09-10 10:47:31.748628'), 2: Timestamp('2020-09-13 15:29:06.077622'), 3: Timestamp('2020-09-24 08:08:39.852348'), 4: Timestamp('2020-10-19 08:54:09.891518')}, 'updatedAt': {0: Timestamp('2020-08-24 20:26:31.805939'), 1: Timestamp('2020-09-10 20:05:18.759421'), 2: Timestamp('2020-09-13 18:38:10.044112'), 3: Timestamp('2020-09-24 08:53:22.512533'), 4: Timestamp('2020-10-19 17:10:09.110038')}, 'rent_time': {0: Timedelta('0 days 04:13:19.955723'), 1: Timedelta('0 days 09:17:47.010793'), 2: Timedelta('0 days 03:09:03.966490'), 3: Timedelta('0 days 00:44:42.660185'), 4: Timedelta('0 days 08:15:59.218520')}}

The idea to aggregate dataframe by column 'creditCardId' and have mean value of 'rent_time'. Ideal output should be:

creditCardId        rent_time mean
40                  0 days 05:08:10.562342

if I run code:

print (time_to_rent['rent_time'].mean())

it works fine and i have "0 days 05:08:10.562342" as output. But when i am trying to get grouping by:

time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']].mean()

I got error back:

~\anaconda3\lib\site-packages\pandas\core\groupby\generic.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   1093 
   1094         if not (agg_blocks or split_frames):
-> 1095             raise DataError("No numeric types to aggregate")
   1096 
   1097         if split_items:

DataError: No numeric types to aggregate

if I use the command:

time_to_rent = time_to_rent.groupby('creditCardId', as_index=False)[['rent_time']]

it returns only "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000000000B5F2EE0>"

May you please help me understand where my mistake is?

Balancer answered 20/10, 2020 at 15:19 Comment(0)
N
1

It's not your mistake, possibly a bug in Pandas since Timedelta can be averaged. A work-around is apply:

time_to_rent.groupby('creditCardId')['rent_time'].apply(lambda x: x.mean())

Output:

creditCardId
40   0 days 05:08:10.562342200
Name: rent_time, dtype: timedelta64[ns]
Norrisnorrv answered 20/10, 2020 at 15:24 Comment(1)
You are my Hero!! Thank you!!Balancer

© 2022 - 2024 — McMap. All rights reserved.