I have an input array which is being split up into bins and I want to calculate the mean for those bins. Let's assume the following example:
>>> import numpy as np
>>> a = np.array([1.4, 2.6, 0.7, 1.1])
Which is being split up into bins by np.digitize
:
>>> bins = np.arange(0, 2 + 1)
>>> indices = np.digitize(a, bins)
>>> indices
array([2, 3, 1, 2])
This does exactly what I expect it to do as you can see here more explicitly:
>>> for i in range(len(bins)):
... f"bin where {i} <= x < {i + 1} contains {a[indices == i + 1]}"
...
'bin where 0 <= x < 1 contains [0.7]'
'bin where 1 <= x < 2 contains [1.4 1.1]'
'bin where 2 <= x < 3 contains [2.6]'
However, now I want to get the mean for each bin. Doing it the non-NumPy way with a for
loop would be like this:
>>> b = np.array([a[indices == i + 1].mean() for i in range(len(bins))])
>>> b
array([0.7 , 1.25, 2.6 ])
But using a for
loop for this appears neither elegant (pythonic), nor efficient, as the list will have to be converted into a NumPy array with np.array
afterwards.
What's the NumPy way to do this?