From reading the pandas documentation, and a good question and answer (What does axis in pandas mean?), I had expected axis=0 to always mean with respect to columns. This works for me when I work with sum(), but works the other way around when I use the dropna() call.
When i Have a dataframe like this:
raw_data = {'column1': [42,13, np.nan, np.nan],
'column2': [4,12, np.nan, np.nan],
'column3': [25,61, np.nan, np.nan]}
Which looks like this:
column1 column2 column3
0 42.0 4.0 25.0
1 13.0 12.0 61.0
2 NaN NaN NaN
3 NaN NaN NaN
I can print the sums for the respective columns, with axis=0. And this:
df = pd.DataFrame(raw_data )
print(df.sum(axis=0))
Gives the output:
column1 55.0
column2 16.0
column3 86.0
When I try to drop values from the dataframe with axis=0, this should again be with respect to columns*. But when I do:
dfclear=df.dropna(axis=0,how='all')
print(dfclear)
I get the output:
column1 column2 column3
0 42.0 4.0 25.0
1 13.0 12.0 61.0
Where I had expected the following (which I get with axis=1):
column1 column2 column3
0 42.0 4.0 25.0
1 13.0 12.0 61.0
2 NaN NaN NaN
3 NaN NaN NaN
So it seems to me that axis behaves differently between sum() and dropna()
Is there something I'm missing here?
*https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html