I am wondering what the fastest way for a mean computation is in numpy. I used the following code to experiment with it:
import time
n = 10000
p = np.array([1] * 1000000)
t1 = time.time()
for x in range(n):
np.divide(np.sum(p), p.size)
t2 = time.time()
print(t2-t1)
3.9222593307495117
t3 = time.time()
for x in range(n):
np.mean(p)
t4 = time.time()
print(t4-t3)
5.271147012710571
I would assume that np.mean would be faster or at least equivalent in speed, however it looks like the combination of numpy functions is faster than np.mean. Why is the combination of numpy functions faster?
np.sum()
being faster thannp.mean()
. (Thenp.divide
call is trivial since you're just giving it two numbers as inputs, not arrays.) I'd have to inspect the underlying implementation of the two functions to understand why. I find it surprising that under the hood,np.mean()
doesn't basically run your alternative code... – Decameter