String method names can refer to any method of the object being operated on. Additionally, if the object has an __array__
attribute (as far as I can tell, if you're calling agg
or transform
directly, not with groupby
, resample
, rolling
, etc), it can refer to anything in numpy's module-level namespace (e.g. anything in np.__all__
). That's not to say that everything that can be referenced will work, but you can actually reference anything in either of these namespaces.
Examples
Here's an example dataframe:
In [9]: df = pd.DataFrame({'abc': list('aaaabbcccc'), 'data': np.random.random(size=10)})
In [10]: df
Out[10]:
abc data
0 a 0.800357
1 a 0.619654
2 a 0.448895
3 a 0.610645
4 b 0.985249
5 b 0.179411
6 c 0.173734
7 c 0.420767
8 c 0.789766
9 c 0.525486
DataFrame & Series methods with .agg
and .transform
This can be aggregated or transformed using anything DataFrame methods (as long as the shape rules applying to agg
and transform
are followed).
Of course, there are the aggregation methods we're all familiar with:
In [93]: df.agg("sum")
Out[93]:
abc aaaabbcccc
data 5.553964
dtype: object
But you could really give anything in the DataFrame/Series API a whirl:
In [95]: df.transform("shift")
Out[95]:
abc data
0 NaN NaN
1 a 0.800357
2 a 0.619654
3 a 0.448895
4 a 0.610645
5 b 0.985249
6 b 0.179411
7 c 0.173734
8 c 0.420767
9 c 0.789766
In [102]: df.agg("dtypes")
Out[102]:
abc object
data float64
dtype: object
Numpy methods with .agg
and .transform
Additionally, when working directly with pandas objects, we can use numpy global methods as well. Many of these don't work the way you might expect, so user beware:
In [101]: df.data.transform("expm1")
Out[101]:
0 1.226334
1 0.858285
2 0.566580
3 0.841620
4 1.678479
5 0.196512
6 0.189739
7 0.523129
8 1.202882
9 0.691281
Name: data, dtype: float64
In [103]: df.agg("rot90")
Out[103]:
array([[0.8003565068959021, 0.619653790821421, 0.44889504260755986,
0.6106454343417287, 0.9852492020323964, 0.17941064387786554,
0.17373389351532997, 0.42076690363942437, 0.7897663627044728,
0.5254860156343195],
['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c']], dtype=object)
In [107]: df.agg("meshgrid")
Out[107]:
[array(['a', 0.8003565068959021, 'a', 0.619653790821421, 'a',
0.44889504260755986, 'a', 0.6106454343417287, 'b',
0.9852492020323964, 'b', 0.17941064387786554, 'c',
0.17373389351532997, 'c', 0.42076690363942437, 'c',
0.7897663627044728, 'c', 0.5254860156343195], dtype=object)]
In [109]: df.agg("diag")
Out[109]: array(['a', 0.619653790821421], dtype=object)
Methods available to GroupBy, Window, and Resample operations
These numpy methods aren't available directly to Groupby
, Rolling
, Expanding
, Resample
, etc objects. But you can still call anything in the pandas API available to these objects:
In [117]: df.groupby('abc').agg("dtypes")
Out[117]:
data
abc
a float64
b float64
c float64
In [129]: df.groupby("abc").agg("ohlc")
Out[129]:
data
open high low close
abc
a 0.800357 0.800357 0.448895 0.610645
b 0.985249 0.985249 0.179411 0.179411
c 0.173734 0.789766 0.173734 0.525486
In [137]: df.rolling(3).data.agg("quantile", 0.9)
Out[137]:
0 NaN
1 NaN
2 0.764216
3 0.617852
4 0.910328
5 0.910328
6 0.824081
7 0.372496
8 0.715966
9 0.736910
Name: data, dtype: float64
Note that the section of the pandas API which is relevant to the object scope is the Groupby
, Window
, or Resampling
object itself, not the DataFrame or Series. So check the API of these objects for the full API reference.
Implementation
Buried deep in the pandas internals, you can trace the handling of string aggregation operations to a couple variations on this function, currently in pandas.core.apply._try_aggregate_string_function
:
def _try_aggregate_string_function(self, obj, arg: str, *args, **kwargs):
"""
if arg is a string, then try to operate on it:
- try to find a function (or attribute) on ourselves
- try to find a numpy function
- raise
"""
assert isinstance(arg, str)
f = getattr(obj, arg, None)
if f is not None:
if callable(f):
return f(*args, **kwargs)
# people may try to aggregate on a non-callable attribute
# but don't let them think they can pass args to it
assert len(args) == 0
assert len([kwarg for kwarg in kwargs if kwarg not in ["axis"]]) == 0
return f
f = getattr(np, arg, None)
if f is not None and hasattr(obj, "__array__"):
# in particular exclude Window
return f(obj, *args, **kwargs)
raise AttributeError(
f"'{arg}' is not a valid function for '{type(obj).__name__}' object"
)
Similarly, in many places in the test suite and internals, the logic getattr(obj, f)
is used, where obj
is the data structure and f
is the string function name.