Add column to the end of Pandas DataFrame containing average of previous data
Asked Answered
M

4

24

I have a DataFrame ave_data that contains the following:

ave_data

Time        F7           F8            F9  
00:00:00    43.005593    -56.509746    25.271271  
01:00:00    55.114918    -59.173852    31.849262  
02:00:00    63.990762    -64.699492    52.426017

I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.

The ave_data DataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)

desired output

Time        F7           F8            F9           Average
00:00:00    43.005593    -56.509746    25.271271    4.25  
01:00:00    55.114918    -59.173852    31.849262    9.26
02:00:00    63.990762    -64.699492    52.426017    17.24
Madigan answered 29/7, 2015 at 11:13 Comment(0)
A
27

You can take a copy of your df using copy() and then just call mean and pass params axis=1 and numeric_only=True so that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:

In [68]:

summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
                 Time         F7         F8         F9    average
0 2015-07-29 00:00:00  43.005593 -56.509746  25.271271   3.922373
1 2015-07-29 01:00:00  55.114918 -59.173852  31.849262   9.263443
2 2015-07-29 02:00:00  63.990762 -64.699492  52.426017  17.239096
Aver answered 29/7, 2015 at 11:21 Comment(0)
M
17

@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)

so you could simplify this with:

df['average'] = df.mean(numeric_only=True, axis=1)

I would have simply added this as a comment but don't have the reputation

Maratha answered 28/2, 2018 at 18:40 Comment(1)
the copy part is only needed if you want to do more operations, e.g. save to files in one or more formats and print, and at same time avoid overwriting the old variable (that is anyway good practice), so in this sense it is a more general answer, while yours is more essential.Jolley
F
11

In common case if you would like to use specific columns, you can use:

df['average'] = df[['F7','F8']].mean(axis=1)

where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)

Then you may want to sort by this column:

df.sort_values(by='average',ascending=False, inplace=True)

where inplace=True stands for applying action to dataframe instead of calculating on the copy.

Festivity answered 17/4, 2019 at 6:56 Comment(0)
H
3

df.assign is specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning. It works as follows:

data_with_avg = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True)
)

This function can also create multiple columns at the same time:

data_with_ave = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True),
    median = avg_data.median(axis=1, numeric_only=True)
)

As of pandas 0.36, you can even reference a newly created column to create another:

data_with_ave = avg_data.assign(
    average = avg_data.mean(axis=1, numeric_only=True),
    isLarge = lambda df: df['average'] > 10
)
Hellhound answered 22/9, 2019 at 18:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.