How do I find the arithmetic mean of a list in Python? For example:
[1, 2, 3, 4] ⟶ 2.5
How do I find the arithmetic mean of a list in Python? For example:
[1, 2, 3, 4] ⟶ 2.5
For Python 3.8+, use statistics.fmean
for numerical stability with floats. (Fast.)
For Python 3.4+, use statistics.mean
for numerical stability with floats. (Slower.)
xs = [15, 18, 2, 36, 12, 78, 5, 6, 9]
import statistics
statistics.mean(xs) # = 20.11111111111111
For older versions of Python 3, use
sum(xs) / len(xs)
For Python 2, convert len
to a float to get float division:
sum(xs) / float(len(xs))
lambda...
--> 2.59us, numpy.mean(l)
--> 27.5us, sum(l)/len(;)
--> 650ns –
Alban /
returns a float
regardless. You can use from__future__ import division
to ensure the same behavior in Python 2.2 and up (so basically any version that's suitable for production today). –
Emeldaemelen math.fsum(l) / len(l)
is faster then fmean
, see: https://mcmap.net/q/63574/-finding-the-average-of-a-list –
Ruffo xs = [15, 18, 2, 36, 12, 78, 5, 6, 9]
sum(xs) / len(xs)
float('%.2f' % float(sum(l) / len(l)))
–
Wing round(result, 2)
. –
Joanajoane Use numpy.mean
:
xs = [15, 18, 2, 36, 12, 78, 5, 6, 9]
import numpy as np
print(np.mean(xs))
sum(l)/len(l)
–
Cassel np.array(l).mean()
is much faster. –
Cassel np.mean(l)
and np.array(l).mean
are about the same speed, and sum(l)/len(l)
is about twice as fast. I used l = list(np.random.rand(1000))
, for course both numpy
methods become much faster if l
is numpy.array
. –
Allerus np.nanmean(l)
in order to avoid issues with NAN and zero divisions –
Ser For Python 3.4+, use mean()
from the new statistics
module to calculate the average:
from statistics import mean
xs = [15, 18, 2, 36, 12, 78, 5, 6, 9]
mean(xs)
statistics.StatisticsError: mean requires at least one data point
instead of a more cryptic ZeroDivisionError: division by zero
for the sum(x) / len(x)
solution. –
Daggna Why would you use reduce()
for this when Python has a perfectly cromulent sum()
function?
print sum(l) / float(len(l))
(The float()
is necessary in Python 2 to force Python to do a floating-point division.)
float()
is not necessary on Python 3. –
Daggna There is a statistics library if you are using python >= 3.4
https://docs.python.org/3/library/statistics.html
You may use it's mean method like this. Let's say you have a list of numbers of which you want to find mean:-
list = [11, 13, 12, 15, 17]
import statistics as s
s.mean(list)
It has other methods too like stdev, variance, mode, harmonic mean, median etc which are too useful.
Instead of casting to float, you can add 0.0 to the sum:
def avg(l):
return sum(l, 0.0) / len(l)
EDIT:
I added two other ways to get the average of a list (which are relevant only for Python 3.8+). Here is the comparison that I made:
import timeit
import statistics
import numpy as np
from functools import reduce
import pandas as pd
import math
LIST_RANGE = 10
NUMBERS_OF_TIMES_TO_TEST = 10000
l = list(range(LIST_RANGE))
def mean1():
return statistics.mean(l)
def mean2():
return sum(l) / len(l)
def mean3():
return np.mean(l)
def mean4():
return np.array(l).mean()
def mean5():
return reduce(lambda x, y: x + y / float(len(l)), l, 0)
def mean6():
return pd.Series(l).mean()
def mean7():
return statistics.fmean(l)
def mean8():
return math.fsum(l) / len(l)
for func in [mean1, mean2, mean3, mean4, mean5, mean6, mean7, mean8 ]:
print(f"{func.__name__} took: ", timeit.timeit(stmt=func, number=NUMBERS_OF_TIMES_TO_TEST))
These are the results I got:
mean1 took: 0.09751558300000002
mean2 took: 0.005496791999999973
mean3 took: 0.07754683299999998
mean4 took: 0.055743208000000044
mean5 took: 0.018134082999999968
mean6 took: 0.6663848750000001
mean7 took: 0.004305374999999945
mean8 took: 0.003203333000000086
Interesting! looks like math.fsum(l) / len(l)
is the fastest way, then statistics.fmean(l)
, and only then sum(l) / len(l)
. Nice!
Thank you @Asclepius for showing me these two other ways!
OLD ANSWER:
In terms of efficiency and speed, these are the results that I got testing the other answers:
# test mean caculation
import timeit
import statistics
import numpy as np
from functools import reduce
import pandas as pd
LIST_RANGE = 10
NUMBERS_OF_TIMES_TO_TEST = 10000
l = list(range(LIST_RANGE))
def mean1():
return statistics.mean(l)
def mean2():
return sum(l) / len(l)
def mean3():
return np.mean(l)
def mean4():
return np.array(l).mean()
def mean5():
return reduce(lambda x, y: x + y / float(len(l)), l, 0)
def mean6():
return pd.Series(l).mean()
for func in [mean1, mean2, mean3, mean4, mean5, mean6]:
print(f"{func.__name__} took: ", timeit.timeit(stmt=func, number=NUMBERS_OF_TIMES_TO_TEST))
and the results:
mean1 took: 0.17030245899968577
mean2 took: 0.002183011999932205
mean3 took: 0.09744236000005913
mean4 took: 0.07070840100004716
mean5 took: 0.022754742999950395
mean6 took: 1.6689282460001778
so clearly the winner is:
sum(l) / len(l)
np.array
first, np.mean
takes ~.16s, so about 6x faster than sum(l)/len(l)
. Conclusion: if you're doing lots of calculations, best do everything in numpy. –
Collinear mean4
, this is what I do there... I guess that it its already a np.array then it make sense to use np.mean
, but in case you have a list then you should use sum(l) / len(l)
–
Ruffo sum(l) / float(len(l))
is the right answer, but just for completeness you can compute an average with a single reduce:
>>> reduce(lambda x, y: x + y / float(len(l)), l, 0)
20.111111111111114
Note that this can result in a slight rounding error:
>>> sum(l) / float(len(l))
20.111111111111111
reduce()
which would give you False for an empty list, otherwise the average as before. –
Hypoxia float
on len
? –
Jimmy I tried using the options above but didn't work. Try this:
from statistics import mean
n = [11, 13, 15, 17, 19]
print(n)
print(mean(n))
worked on python 3.5
Or use pandas
's Series.mean
method:
pd.Series(sequence).mean()
Demo:
>>> import pandas as pd
>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> pd.Series(l).mean()
20.11111111111111
>>>
From the docs:
Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
¶
And here is the docs for this:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html
And the whole documentation:
I had a similar question to solve in a Udacity´s problems. Instead of a built-in function i coded:
def list_mean(n):
summing = float(sum(n))
count = float(len(n))
if n == []:
return False
return float(summing/count)
Much more longer than usual but for a beginner its quite challenging.
False
(equivalent to the integer 0
) is just about the worst possible way to handle this error. Better to catch the ZeroDivisionError
and raise something better (perhaps ValueError
). –
Chill ValueError
any better than a ZeroDivisionError
? The latter is more specific, plus it seems a bit unnecessary to catch an arithmetic error only to re-throw a different one. –
Toilet ZeroDivisionError
is only useful if you know how the calculation is being done (i.e., that a division by the length of the list is involved). If you don't know that, it doesn't tell you what the problem is with the value you passed in. Whereas your new exception can include that more specific information. –
Chill as a beginner, I just coded this:
L = [15, 18, 2, 36, 12, 78, 5, 6, 9]
total = 0
def average(numbers):
total = sum(numbers)
total = float(total)
return total / len(numbers)
print average(L)
sum(l)/len(l)
is by far the most elegant answer (no need to make type conversions in Python 3). –
Torrlow If you wanted to get more than just the mean (aka average) you might check out scipy stats:
from scipy import stats
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(stats.describe(l))
# DescribeResult(nobs=9, minmax=(2, 78), mean=20.11111111111111,
# variance=572.3611111111111, skewness=1.7791785448425341,
# kurtosis=1.9422716419666397)
In order to use reduce
for taking a running average, you'll need to track the total but also the total number of elements seen so far. since that's not a trivial element in the list, you'll also have to pass reduce
an extra argument to fold into.
>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> running_average = reduce(lambda aggr, elem: (aggr[0] + elem, aggr[1]+1), l, (0.0,0))
>>> running_average[0]
(181.0, 9)
>>> running_average[0]/running_average[1]
20.111111111111111
Both can give you close to similar values on an integer or at least 10 decimal values. But if you are really considering long floating values both can be different. Approach can vary on what you want to achieve.
>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> print reduce(lambda x, y: x + y, l) / len(l)
20
>>> sum(l)/len(l)
20
Floating values
>>> print reduce(lambda x, y: x + y, l) / float(len(l))
20.1111111111
>>> print sum(l)/float(len(l))
20.1111111111
@Andrew Clark was correct on his statement.
suppose that
x = [
[-5.01,-5.43,1.08,0.86,-2.67,4.94,-2.51,-2.25,5.56,1.03],
[-8.12,-3.48,-5.52,-3.78,0.63,3.29,2.09,-2.13,2.86,-3.33],
[-3.68,-3.54,1.66,-4.11,7.39,2.08,-2.59,-6.94,-2.26,4.33]
]
you can notice that x
has dimension 3*10 if you need to get the mean
to each row you can type this
theMean = np.mean(x1,axis=1)
don't forget to import numpy as np
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
l = map(float,l)
print '%.2f' %(sum(l)/len(l))
Find the average in list By using the following PYTHON code:
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(sum(l)//len(l))
try this it easy.
print reduce(lambda x, y: x + y, l)/(len(l)*1.0)
or like posted previously
sum(l)/(len(l)*1.0)
The 1.0 is to make sure you get a floating point division
Combining a couple of the above answers, I've come up with the following which works with reduce and doesn't assume you have L
available inside the reducing function:
from operator import truediv
L = [15, 18, 2, 36, 12, 78, 5, 6, 9]
def sum_and_count(x, y):
try:
return (x[0] + y, x[1] + 1)
except TypeError:
return (x + y, 2)
truediv(*reduce(sum_and_count, L))
# prints
20.11111111111111
I want to add just another approach
import itertools,operator
list(itertools.accumulate(l,operator.add)).pop(-1) / len(l)
You can make a function for averages, usage:
average(21,343,2983) # You can pass as many arguments as you want.
Here is the code:
def average(*args):
total = 0
for num in args:
total+=num
return total/len(args)
*args
allows for any number of answers.
average(3,5,123)
, but you can input other numbers. And keep in mind that it returns a value, and doesn't print anything. –
Beamy Simple solution is a avemedi-lib
pip install avemedi_lib
Than include to your script
from avemedi_lib.functions import average, get_median, get_median_custom
test_even_array = [12, 32, 23, 43, 14, 44, 123, 15]
test_odd_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Getting average value of list items
print(average(test_even_array)) # 38.25
# Getting median value for ordered or unordered numbers list
print(get_median(test_even_array)) # 27.5
print(get_median(test_odd_array)) # 27.5
# You can use your own sorted and your count functions
a = sorted(test_even_array)
n = len(a)
print(get_median_custom(a, n)) # 27.5
Enjoy.
Unlike statistics.mean()
, statistics.fmean()
works for a list of objects with different numeric types. For example:
from decimal import Decimal
import statistics
data = [1, 4.5, Decimal('3.5')]
statistics.mean(data) # TypeError
statistics.fmean(data) # OK
This is because under the hood, mean()
uses statistics._sum()
which returns a data type to convert the mean into (and Decimal is not on Python's number hierarchy), while fmean()
uses math.fsum()
which just adds the numbers up (which is also much faster than built-in sum()
function).
One consequence of this is that fmean()
always returns a float (because averaging involves division) while mean()
could return a different type depending on the number types in the data. The following example shows that mean()
can return different types while for the same lists, fmean()
returns 3.0
, a float for all of them.
statistics.mean([2, Fraction(4,1)]) # Fraction(3, 1) <--- fractions.Fraction
statistics.mean([2, 4.0]) # 3.0 <--- float
statistics.mean([2, 4]) # 3 <--- int
Also, unlike sum(data)/len(data)
, fmean()
(and mean()
) works not just on lists but on general iterables such as generators as well. This is useful, if your data is massive and/or you need to perform off-the-cuff filtering before computing the mean.
For example, if a list has NaN values averaging returns NaN. If you want to average the list while ignoring NaN values, you can filter out the NaN values and pass a generator to fmean
:
data = [1, 2, float('nan')]
statistics.fmean(x for x in data if x==x) # 1.5
Note that numpy has a function (numpy.nanmean()
) that does the same job.
import numpy as np
np.nanmean(data) # 1.5
© 2022 - 2024 — McMap. All rights reserved.
sum(L) / float(len(L))
. handle empty lists in caller code likeif not L: ...
– Poisonousxs
is that it's a Haskell convention and this being a mathematical question might inspire the homage to that language. – Ettie