Add columns to pandas dataframe containing max of each row, AND corresponding column name
Asked Answered
A

2

1

My system

Windows 7, 64 bit

python 3.5.1

The challenge

I've got a pandas dataframe, and I would like to know the maximum value for each row, and append that info as a new column. I would also like to know the name of the column where the maximum value is located. And I would like to add another column to the existing dataframe containing the name of the column where the max value can be found.

A similar question has been asked and answered for R in this post.

Reproducible example

In[1]:
# Make pandas dataframe
df = pd.DataFrame({'a':[1,0,0,1,3], 'b':[0,0,1,0,1], 'c':[0,0,0,0,0]})

# Calculate max 
my_series = df.max(numeric_only=True, axis = 1)
my_series.name = "maxval"

# Include maxval in df
df = df.join(my_series)
df        

Out[1]:
    a  b  c  maxval
0   1  0  0  1
1   0  0  0  0
2   0  1  0  1
3   1  0  0  1
4   3  1  0  3

So far so good. Now for the add another column to the existing dataframe containing the name of the column part:

In[2]:
?
?
?


# This is what I'd like to accomplish:
Out[2]:
        a  b  c  maxval maxcol
    0   1  0  0  1      a
    1   0  0  0  0      a,b,c       
    2   0  1  0  1      b
    3   1  0  0  1      a
    4   3  1  0  3      a

Notice that I'd like to return all column names if multiple columns contain the same maximum value. Also please notice that the column maxval is not included in maxcol since that would not make much sense. Thanks in advance if anyone out there finds this interesting.

Aquavit answered 6/7, 2016 at 11:59 Comment(0)
A
2

You can compare the df against maxval using eq with axis=0, then use apply with a lambda to produce a boolean mask to mask the columns and join them:

In [183]:
df['maxcol'] = df.ix[:,:'c'].eq(df['maxval'], axis=0).apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
df

Out[183]:
   a  b  c  maxval maxcol
0  1  0  0       1      a
1  0  0  0       0  a,b,c
2  0  1  0       1      b
3  1  0  0       1      a
4  3  1  0       3      a
Agata answered 6/7, 2016 at 12:4 Comment(0)
F
1
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'a': [1, 0, 0, 1, 3],
    'b': [0, 0, 1, 0, 1],
    'c': [0, 0, 0, 0, 0],
    'maxval': [1, 0, 1, 1, 3]
})

# Get the columns to compare
cols = df.columns[:-1] #['a', 'b', 'c']

mask = np.equal(df[cols].values,df['maxval'].values[:,None])
'''
[[ True False False]
 [ True  True  True]
 [False  True False]
 [ True False False]
 [ True False False]]
'''
#res  = [','.join(np.array(cols)[row_mask]) for row_mask in mask]
res = list(map(lambda row_mask : ','.join(np.array(cols)[row_mask]),mask))
print(res)#['a', 'a,b,c', 'b', 'a', 'a']
df['max_col'] = res
print(df)
'''
  a  b  c  maxval max_col
0  1  0  0       1       a
1  0  0  0       0   a,b,c
2  0  1  0       1       b
3  1  0  0       1       a
4  3  1  0       3       a
'''
Fireplug answered 17/9, 2024 at 16:1 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.