Take the maximum in absolute value from different columns and filter out NaN Python
Asked Answered
P

5

12

This was my try. For example

df = pd.DataFrame({'a':[5,0,1,np.nan], 'b':[np.nan,1,4,3], 'c':[-3,-2,0,0]})
df.dropna(axis=1).max(axis=1,key=abs)

Filters out well the NaN values but it gets 0 or negative values instead of the highes in absolute value

The result should be one column with

5
-2
4
3
Phelan answered 7/12, 2015 at 10:24 Comment(2)
when you do dropna you losing all columns with NaN values and only c column is leftPrissie
Ok. At any case if i use df.max(axis=1,key=abs) it does not take the max in absolute value but just the max positivePhelan
P
16

I solved by

maxCol=lambda x: max(x.min(), x.max(), key=abs)
df.apply(maxCol,axis=1)
Phelan answered 7/12, 2015 at 11:1 Comment(6)
this solution works, but really slow... is there a faster solution?Fultz
How could I alter this so that it takes the absolute minimum value? I tried to replace max with min so min(x.min(),x.max(),key=abs) but that did not work.Lisp
@AlonGouldman my answer below should be more efficient if you're having performance issues.Nighthawk
@AndrewHamel replace max() with min() in my answer below and it should workNighthawk
@Nighthawk I wanted (and also the OP) to keep the negative values. your way converts them into positiveFultz
@AlonGouldman Good clarification! In that case I'd recommend something like df.idxmax() to get the indices of the maxima, then use those indices to select the original values in the original df. This approach should still outperform any apply operations.Nighthawk
N
9

The most straightforward and efficient way is to convert to absolute values, and then find the max. Pandas supports this with straightforward syntax (abs and max) and does not require expensive apply operations:

df.abs().max()

max() accepts an axis argument, which can be used to specify whether to calculate the max on rows or columns.

Nighthawk answered 11/11, 2020 at 20:12 Comment(1)
This wouldn't answer the question asked because it removes the negative values.Chipboard
G
5

You can use np.nanargmax on the squared data:

>>> df.values[range(df.shape[0]),np.nanargmax(df**2,axis=1)]
array([ 5., -2.,  4.,  3.])
Gruesome answered 7/12, 2015 at 10:32 Comment(0)
P
1
df = df.fillna(0)
l = df.abs().values.argmax(axis=1)
pd.Series([df.values[i][l[i]] for i in range(len(df.values))])

In [532]: pd.Series([df.values[i][l[i]] for i in range(len(df.values))])
Out[532]:
0    5
1   -2
2    4
3    3
dtype: float64

One liner:

pd.Series([df.values[i][df.fillna(0).abs().values.argmax(axis=1)[i]] for i in range(len(df.values))])
Prissie answered 7/12, 2015 at 10:57 Comment(0)
C
-1

Due to my low reputation score I would like to add here to the answer of gis20 and the question of Andrew Hamel regarding the absolute minimum value:

minCol=lambda x: min(x, key=abs)
minCol=lambda x: min([abs(value) for value in x])  

works for my data, however, it cannot cope with np.nan's.

Cimino answered 20/8, 2020 at 10:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.