Why doesn't pandas.sum() work across both axes when using axis=None parameter?
Asked Answered
L

2

7

Could anyone explain why pandas doesn't sum across both axes with parameter axis=None. As it said in API reference:

pandas.DataFrame.sum

DataFrame.sum(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)
This is equivalent to the method numpy.sum

Parameters: axis: {index (0), columns (1)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.
For DataFrames, specifying axis=None will apply the aggregation across both axes.

But when I use parameter axis=None it works the same as axis=0

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[4,6,8]})
df
Output:
    a   b
0   1   4
1   2   6
2   3   8
df.sum(axis=None)
Output:
a     6
b    18
dtype: int64

The same as:

df.sum(axis=0)
Output:
a     6
b    18
dtype: int64

Shouldn't it work as numpy.sum() works?

import numpy as np
df.to_numpy().sum()
Output:
24
Lemieux answered 20/5, 2023 at 12:3 Comment(2)
need pandas version 2.0.0+ pandas.pydata.org/docs/reference/api/pandas.DataFrame.sum.htmlNesta
@Panda Kim Pandas 2 still gives the same as for axis = 0 although docs suggest otherwise as you indicateValera
H
3

When such malfunction appears one possible solution is to read the code. Lets investigate source code, pandas.DataFrame.sum code is

def sum(
    self,
    axis: Axis | None = None,
    skipna: bool_t = True,
    numeric_only: bool_t = False,
    min_count: int = 0,
    **kwargs,
):
    return NDFrame.sum(self, axis, skipna, numeric_only, min_count, **kwargs)

which prompts inquiry of NDFrame.sum actually is

def sum(
    self,
    axis: Axis | None = None,
    skipna: bool_t = True,
    numeric_only: bool_t = False,
    min_count: int = 0,
    **kwargs,
):
    return self._min_count_stat_function(
        "sum", nanops.nansum, axis, skipna, numeric_only, min_count, **kwargs
    )

which prompts inquiry of what _min_count_stat_function is

@final
def _min_count_stat_function(
    self,
    name: str,
    func,
    axis: Axis | None = None,
    skipna: bool_t = True,
    numeric_only: bool_t = False,
    min_count: int = 0,
    **kwargs,
):
    if name == "sum":
        nv.validate_sum((), kwargs)
    elif name == "prod":
        nv.validate_prod((), kwargs)
    else:
        nv.validate_stat_func((), kwargs, fname=name)

    validate_bool_kwarg(skipna, "skipna", none_allowed=False)

    if axis is None:
        axis = self._stat_axis_number

    return self._reduce(
        func,
        name=name,
        axis=axis,
        skipna=skipna,
        numeric_only=numeric_only,
        min_count=min_count,
    )

Observe if taking care of axis being None, which does set axis value to _stat_axis_number, which itself is set earlier to zero

_stat_axis_number = 0

and is never changed in any other place of available source-code (it seems to be treated as read-only), therefore giving None as axis value is same as giving 0.

Documentation is not compliant with what code is actually doing.

Hailstorm answered 20/5, 2023 at 12:25 Comment(0)
B
0

The axis=None parameter in the pandas.sum() function does not work across both axes when there are non-numeric values present in the DataFrame or Series. It only works for numeric data.

Blomquist answered 20/5, 2023 at 13:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.