I notice that
In [30]: np.mean([1, 2, 3])
Out[30]: 2.0
In [31]: np.average([1, 2, 3])
Out[31]: 2.0
However, there should be some differences, since after all they are two different functions.
What are the differences between them?
I notice that
In [30]: np.mean([1, 2, 3])
Out[30]: 2.0
In [31]: np.average([1, 2, 3])
Out[31]: 2.0
However, there should be some differences, since after all they are two different functions.
What are the differences between them?
np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average
np.mean:
try:
mean = a.mean
except AttributeError:
return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)
np.average:
...
if weights is None :
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
#code that does weighted mean here
if returned: #returned is another optional argument
scl = np.multiply(avg, 0) + scl
return avg, scl
else:
return avg
...
np.average
since weights
is already optional. Seems unnecessary and only serves to confuse users. –
Precentor np.mean
always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).
np.average
can compute a weighted average if the weights
parameter is supplied.
In some version of numpy there is another imporant difference that you must be aware:
average
do not take in account masks, so compute the average over the whole set of data.
mean
takes in account masks, so compute the mean only over unmasked values.
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
np.ma.average
works. Also, there is a bug report. –
Monophyletic np.average
and np.mean
both takes into account masks. I've tried and got the value of "Out: 2.0
" –
Lanni In addition to the differences already noted, there's another extremely important difference that I just now discovered the hard way: unlike np.mean
, np.average
doesn't allow the dtype
keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5
file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64'
:
>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')
m1 = np.average(T, axis=(0,1)) # garbage
m2 = np.mean(T, axis=(0,1)) # the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64') # correct results
Unfortunately, unless you know what to look for, you can't necessarily tell your results are wrong. I will never use np.average
again for this reason but will always use np.mean(.., dtype='float64')
on any large array. If I want a weighted average, I'll compute it explicitly using the product of the weight vector and the target array and then either np.sum
or np.mean
, as appropriate (with appropriate precision as well).
In your invocation, the two functions are the same.
average
can compute a weighted average though.
© 2022 - 2024 — McMap. All rights reserved.