Take the maximum in absolute value from different columns and filter out NaN Python

Asked 7/12, 2015 at 10:24 Answered 11/11, 2020 at 20:12

This was my try. For example

df = pd.DataFrame({'a':[5,0,1,np.nan], 'b':[np.nan,1,4,3], 'c':[-3,-2,0,0]})
df.dropna(axis=1).max(axis=1,key=abs)

Filters out well the NaN values but it gets 0 or negative values instead of the highes in absolute value

The result should be one column with

Phelan answered 7/12, 2015 at 10:24 Comment(2)

when you do dropna you losing all columns with NaN values and only c column is left – Prissie 7/12, 2015 at 10:29

Ok. At any case if i use df.max(axis=1,key=abs) it does not take the max in absolute value but just the max positive – Phelan 7/12, 2015 at 10:35

I solved by

maxCol=lambda x: max(x.min(), x.max(), key=abs)
df.apply(maxCol,axis=1)

Phelan answered 7/12, 2015 at 11:1 Comment(6)

this solution works, but really slow... is there a faster solution? – Fultz 26/5, 2020 at 17:55

How could I alter this so that it takes the absolute minimum value? I tried to replace max with min so min(x.min(),x.max(),key=abs) but that did not work. – Lisp 5/8, 2020 at 14:24

@AlonGouldman my answer below should be more efficient if you're having performance issues. – Nighthawk 11/11, 2020 at 20:16

@AndrewHamel replace max() with min() in my answer below and it should work – Nighthawk 11/11, 2020 at 20:17

@Nighthawk I wanted (and also the OP) to keep the negative values. your way converts them into positive – Fultz 12/11, 2020 at 12:49

@AlonGouldman Good clarification! In that case I'd recommend something like df.idxmax() to get the indices of the maxima, then use those indices to select the original values in the original df. This approach should still outperform any apply operations. – Nighthawk 12/11, 2020 at 18:27

The most straightforward and efficient way is to convert to absolute values, and then find the max. Pandas supports this with straightforward syntax (abs and max) and does not require expensive apply operations:

df.abs().max()

max() accepts an axis argument, which can be used to specify whether to calculate the max on rows or columns.

Nighthawk answered 11/11, 2020 at 20:12 Comment(1)

This wouldn't answer the question asked because it removes the negative values. – Chipboard 14/4, 2021 at 16:27

You can use np.nanargmax on the squared data:

>>> df.values[range(df.shape[0]),np.nanargmax(df**2,axis=1)]
array([ 5., -2.,  4.,  3.])

Gruesome answered 7/12, 2015 at 10:32 Comment(0)

df = df.fillna(0)
l = df.abs().values.argmax(axis=1)
pd.Series([df.values[i][l[i]] for i in range(len(df.values))])

In [532]: pd.Series([df.values[i][l[i]] for i in range(len(df.values))])
Out[532]:
0    5
1   -2
2    4
3    3
dtype: float64

One liner:

pd.Series([df.values[i][df.fillna(0).abs().values.argmax(axis=1)[i]] for i in range(len(df.values))])

Prissie answered 7/12, 2015 at 10:57 Comment(0)

-1

Due to my low reputation score I would like to add here to the answer of gis20 and the question of Andrew Hamel regarding the absolute minimum value:

minCol=lambda x: min(x, key=abs)
minCol=lambda x: min([abs(value) for value in x])

works for my data, however, it cannot cope with np.nan's.

Cimino answered 20/8, 2020 at 10:1 Comment(0)

Recommended topics

Hot tags