Finding median of list in Python
Asked Answered
H

29

285

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.

If the list contains an even number of elements, the function should return the average of the middle two.

Here are some examples (sorted for display purposes):

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2
Hydrosome answered 7/6, 2014 at 21:4 Comment(2)
Selection AlgorithmAtomism
The answers here are good, so I think I want this to be roughly a canonical answer for finding medians, largely so I could close this. Note that that question has 30 thousand views. I'd appreciate if this question wasn't closed or obliviated in any manner so that it can stay on the search results and suck up those views instead.Sophisticated
S
331

Python 3.4 has statistics.median:

Return the median (middle value) of numeric data.

When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

Usage:

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

It's pretty careful with types, too:

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')
Sophisticated answered 8/6, 2014 at 0:8 Comment(3)
Perfect, worked for me to add it to pip3 install itunizer to add median data to the query results. CheersLugar
What if you want to find median of a sorted array. So you cannot use built in function statistics.median because it will slow down while sorting againSubphylum
@Subphylum Then look at the middle element, or average the middle two.Sophisticated
C
198

(Works with ):

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (s[n//2-1]/2.0+s[n//2]/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median():

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

For , use statistics.median:

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0
Ceratoid answered 7/6, 2014 at 23:33 Comment(7)
While it is not writing a function, it is still a more "pythonic" solution imhoHousekeeping
@Housekeeping Not really; it's unadvisable to coerce to a Numpy array without good reason. You've coerced types and, worse, lost support for arbitrary types.Sophisticated
Points taken, useful.Housekeeping
The function is much more laborious than it needs to be, though.Precedency
@a-j Martijn is right. You can replace if len(lst) %2 == 0: with else:. And you can replace return lst[((len(lst)+1)/2)-1] with return lst[len(lst)//2] and return float(sum(lst[(len(lst)/2)-1:(len(lst)/2)+1]))/2.0 with i = len(lst)//2; return (lst[i - 1] + lst[i])/2. There is also no need to do len(lst) more than once. Replace all entries with n and do n = len(lst) as the first thing.Marienthal
PEP 450 makes a good argument against not using a library. You will eventually make a mistake.Auten
For those who can't or don't want install numpy package, statistics.median works very well.Bicarbonate
M
75

The sorted() function is very helpful for this. Use the sorted function to order the list, then simply return the middle value (or average the two middle values if the list contains an even amount of elements).

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2
   
    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0
Monophony answered 7/6, 2014 at 22:9 Comment(2)
It is highly inefficient though: sorting is much more work in the worst case (Theta(n lg n)) than selecting the median (Theta(n))...Ambulant
(I wrote a function with uses mod to determine if an even split can occur) def median(values): """Get the median of a list of values Args: values (iterable of float): A list of numbers Returns: float """ # Write the median() function values=values.sort() n = len(values) if n%2==0: median1 = values[n//2] median2 = values[n//2 - 1] median = (median1 + median2)/2 else: median = values[n//2] return median print(median([1,2,4,3,5]))Abbyabbye
C
19

Of course in Python3 you can use built in functions, but if you are using Python2 or just would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.

def median(data):
    data.sort()
    mid = len(data) // 2
    return (data[mid] + data[~mid]) / 2.0
Cantaloupe answered 21/1, 2018 at 17:22 Comment(0)
G
13

Here's a cleaner solution:

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

Note: Answer changed to incorporate suggestion in comments.

Gasholder answered 25/4, 2015 at 20:18 Comment(2)
float(sum(…) / 2) should be replaced with sum(…) / 2.0; otherwise, if sum(…) is an integer, you'll get a float version of the integer quotient. For example: float(sum([3, 4]) / 2) is 3.0, but sum([3, 4]) / 2.0 is 3.5.Cadal
For completeness, @musiphil: only in python 2, and only if you haven't done from __future__ import division.Bastien
P
12

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.

Also you should not use list as a variable name as it shadows python's own list.

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]
Profligate answered 7/6, 2014 at 22:48 Comment(5)
Simple utility functions probably shouldn't mutate any arguments (Especially if the function name is a noun IMO). Also using sorted over .sort() means the argument doesn't have to be a list. It could be any iterator.Idalia
My point was about the function mutating the list. I mentioned supporting any iterable as a nice side-affect of sorted, but that's not it's main benefit. I for one would expect median(list) to work like almost all other builtins or mathematical functions. next() mutates, but I can't think of any others. Surprise mutation is a pain in the ass for debugging.Idalia
@WillS, how is it a surprise when it is documented? What if you are dealing with large data or you have restricted amounts of memory and you cannot make a copy of the list, what then?Profligate
Make the function expect a sorted list and document that. mylist.sort(); middle(mylist), but then it's undeniably a matter of taste. I just think mutation in general should be reserved for methods as far as is possible. The reason list.sort() returns None instead of the list itself is to make the behaviour as obvious and clear as possible. Hiding everything in documentation is like hiding stuff in small-print.Idalia
Let us continue this discussion in chat.Idalia
S
11

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.

Here's an implementation with a randomly chosen pivot:

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

You can trivially turn this into a method to find medians:

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

This is very unoptimised, but it's not likely that even an optimised version will outperform Tim Sort (CPython's built-in sort) because that's really fast. I've tried before and I lost.

Sophisticated answered 8/6, 2014 at 0:49 Comment(2)
So why even think about this if sort() is faster?Filial
@Filial If you're using PyPy, or some type you can't sort easily, or willing to write a C extension for speed, etc.Sophisticated
H
9
def median(x):
    x = sorted(x)
    listlength = len(x) 
    num = listlength//2
    if listlength%2==0:
        middlenum = (x[num]+x[num-1])/2
    else:
        middlenum = x[num]
    return middlenum
Humankind answered 25/9, 2018 at 18:22 Comment(0)
M
7
def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0
Moujik answered 4/3, 2016 at 11:50 Comment(0)
M
6

A simple function to return the median of the given list:

def median(lst):
    lst = sorted(lst)  # Sort the list first
    if len(lst) % 2 == 0:  # Checking if the length is even
        # Applying formula which is sum of middle two divided by 2
        return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
    else:
        # If length is odd then get middle value
        return lst[len(lst) // 2]

Some examples with the median function:

>>> median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> median([9, 12, 80, 21, 34])  # Odd
21

If you want to use library you can just simply do:

>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34])  # Odd
21
Marquittamarr answered 5/7, 2020 at 23:16 Comment(0)
J
5

I posted my solution at Python implementation of "median of medians" algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.

Per Tom's request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)
Jigging answered 21/1, 2016 at 0:0 Comment(0)
H
4

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value 
Havoc answered 22/4, 2020 at 12:7 Comment(0)
A
3

Here what I came up with during this exercise in Codecademy:

def median(data):
    new_list = sorted(data)
    if len(new_list)%2 > 0:
        return new_list[len(new_list)/2]
    elif len(new_list)%2 == 0:
        return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0

print median([1,2,3,4,5,9])
Athelstan answered 27/5, 2016 at 8:52 Comment(0)
L
3

Just two lines are enough.

def get_median(arr):
    '''
    Calculate the median of a sequence.
    :param arr: list
    :return: int or float
    '''
    arr = sorted(arr)
    return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2
Lejeune answered 17/9, 2020 at 2:32 Comment(0)
P
2

median Function

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res
Prosthesis answered 21/5, 2015 at 13:55 Comment(0)
P
2

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source

def calculateMedian(list):
    data = sorted(list)
    n = len(data)
    if n == 0:
        return None
    if n % 2 == 1:
        return data[n // 2]
    else:
        i = n // 2
        return (data[i - 1] + data[i]) / 2
Polygynist answered 3/5, 2017 at 16:54 Comment(0)
S
2
def midme(list1):

    list1.sort()
    if len(list1)%2>0:
            x = list1[int((len(list1)/2))]
    else:
            x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
    return x


midme([4,5,1,7,2])
Spoken answered 18/2, 2018 at 18:0 Comment(0)
P
1
def median(array):
    if len(array) < 1:
        return(None)
    if len(array) % 2 == 0:
        median = (array[len(array)//2-1: len(array)//2+1])
        return sum(median) / len(median)
    else:
        return(array[len(array)//2])
Placket answered 6/4, 2018 at 21:55 Comment(3)
While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value.Ljoka
I'm very sorry! I just started, Stack Overflow, and I don't know how to add a summary....Placket
Click the "Edit" link below your post and add a summary, then save.Domesticate
P
0

I defined a median function for a list of numbers as

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0
Plexiglas answered 14/10, 2014 at 14:12 Comment(1)
This has avoidable duplication of function calls which can be very expensive in time.Nebraska
T
0
import numpy as np
def get_median(xs):
        mid = len(xs) // 2  # Take the mid of the list
        if len(xs) % 2 == 1: # check if the len of list is odd
            return sorted(xs)[mid] #if true then mid will be median after sorting
        else:
            #return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
            return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))
Theisen answered 26/8, 2019 at 7:12 Comment(0)
L
0

A more generalized approach for median (and percentiles) would be:

def get_percentile(data, percentile):
    # Get the number of observations
    cnt=len(data)
    # Sort the list
    data=sorted(data)
    # Determine the split point
    i=(cnt-1)*percentile
    # Find the `floor` of the split point
    diff=i-int(i)
    # Return the weighted average of the value above and below the split point
    return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)

# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4

# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04

Lynda answered 7/5, 2020 at 19:46 Comment(0)
A
0

Implement it:

def median(numbers):
    """
    Calculate median of a list numbers.
    :param numbers: the numbers to be calculated.
    :return: median value of numbers.

    >>> median([1, 3, 3, 6, 7, 8, 9])
    6
    >>> median([1, 2, 3, 4, 5, 6, 8, 9])
    4.5
    >>> import statistics
    >>> import random
    >>> numbers = random.sample(range(-50, 50), k=100)
    >>> statistics.median(numbers) == median(numbers)
    True
    """
    numbers = sorted(numbers)
    mid_index = len(numbers) // 2
    return (
        (numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
        else numbers[mid_index]
    )


if __name__ == "__main__":
    from doctest import testmod

    testmod()

source from

Addendum answered 4/10, 2020 at 16:36 Comment(0)
D
0

Try This

import math
def find_median(arr):
    if len(arr)%2==1:
        med=math.ceil(len(arr)/2)-1
        return arr[med]
    else:
        return -1
print(find_median([1,2,3,4,5,6,7,8]))
Despinadespise answered 20/12, 2021 at 13:32 Comment(1)
Does this require a sorted array?Nebraska
T
0

Using Numpy : Fastest way

import numpy as np
m = np.median([0, 2, 5, 6, 8, 9, 9])
print("ans:", m)
# ans: 6.0
Tenebrae answered 13/3, 2024 at 8:37 Comment(0)
F
-1

Function median:

def median(d):
    d=np.sort(d)
    n2=int(len(d)/2)
    r=n2%2
    if (r==0):
        med=d[n2] 
    else:
        med=(d[n2] + d[n2+1]) / 2
    return med
Fanchie answered 15/2, 2020 at 11:3 Comment(1)
The logic needs to be corrected , some of the below answers [upvoted] has the correct logic , "even" check needs to be done on length , else it fails for ex . for [1,2,3] it returns 2.5 expected answer is 2.Nickens
B
-1

What I did was this:

def median(a):
    a = sorted(a)
    if len(a) / 2 != int:
        return a[len(a) / 2]
    else:
        return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2

Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.

Bouncer answered 6/11, 2020 at 6:31 Comment(1)
Welcome to Stack Overflow! Please, check that your solution was not already proposed as another answer like this one. Also if len(a) / 2 != int is always True because integer or float value cannot be equal to integer class.Subterfuge
O
-1

Simply, Create a Median Function with an argument as a list of the number and call the function.

def median(l):
    l = sorted(l)
    lent = len(l)
    if (lent % 2) == 0:
        m = int(lent / 2)
        result = l[m]
    else:
        m = int(float(lent / 2) - 0.5)
        result = l[m]
    return result
Overtrick answered 27/4, 2021 at 5:17 Comment(0)
T
-2

Here's the tedious way to find median without using the median function:

def median(*arg):
    order(arg)
    numArg = len(arg)
    half = int(numArg/2)
    if numArg/2 ==half:
        print((arg[half-1]+arg[half])/2)
    else:
        print(int(arg[half]))

def order(tup):
    ordered = [tup[i] for i in range(len(tup))]
    test(ordered)
    while(test(ordered)):
        test(ordered)
    print(ordered)


def test(ordered):
    whileloop = 0 
    for i in range(len(ordered)-1):
        print(i)
        if (ordered[i]>ordered[i+1]):
            print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
            original = ordered[i+1]
            ordered[i+1]=ordered[i]
            ordered[i]=original
            whileloop = 1 #run the loop again if you had to switch values
    return whileloop
Thyrsus answered 24/1, 2017 at 19:5 Comment(2)
Is this bubble sort? Why?Shy
why are you swapping values ?Hundredth
L
-3

It is very simple;

def median(alist):
    #to find median you will have to sort the list first
    sList = sorted(alist)
    first = 0
    last = len(sList)-1
    midpoint = (first + last)//2
    return midpoint

And you can use the return value like this median = median(anyList)

Lynellelynett answered 7/12, 2018 at 16:11 Comment(2)
Median requires you to sort your array before you find the midpoint.Redpoll
sList return the sorted array. Doesn't return the medianLynellelynett

© 2022 - 2025 — McMap. All rights reserved.