My system
Windows 7, 64 bit
python 3.5.1
The challenge
I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.
A similar question has been asked and answered for R in this post.
Reproducible example
In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})
# Calculate max
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"
# Include maxval in df
df = df.join(my_series)
df
Out[1]:
a b c maxval
0 1 0 0 1
1 0 0 0 0
2 0 1 0 1
3 1 0 0 1
4 3 1 0 3
So far so good. Now for the add another column to the existing dataframe containing the name of the column part:
In[2]:
?
?
?
# This is what I'd like to accomplish:
Out[2]:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.