This question is motivated by an answer I gave a while ago.
Let's say I have a dataframe like this
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]})
a b c
0 1.0 3.0 NaN
1 2.0 NaN 5.0
2 NaN 10.0 34.0
and I want to replace the NaN
by the maximum of the row, I can do
df.apply(lambda row: row.fillna(row.max()), axis=1)
which gives me the desired output
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 34.0 10.0 34.0
When I, however, use
df.apply(lambda row: row.fillna(max(row)), axis=1)
for some reason it is replaced correctly only in two of three cases:
a b c
0 1.0 3.0 3.0
1 2.0 5.0 5.0
2 NaN 10.0 34.0
Indeed, if I check by hand
max(df.iloc[0, :])
max(df.iloc[1, :])
max(df.iloc[2, :])
Then it prints
3.0
5.0
nan
When doing
df.iloc[0, :].max()
df.iloc[1, :].max()
df.iloc[2, :].max()
it prints the expected
3.0
5.0
34.0
My question is why max()
fails in 1 of three cases but not in all 3. Why are the NaN
sometimes ignored and sometimes not?
nan
is the first entry, while in the other rows it comes later. So maybe it depends on the order in whichmax
handles these values... – Destroymax([1,2,np.nan])
andmax([np.nan,2,3])
. – Destroynp.nanmax()
exists too and it ignoresnp.nan
altogether. – Shirley