How should the interquartile range be calculated in Python?
Asked Answered
I

3

8

I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is.

My attempt in Python is as follows:

>>> a = numpy.array([1, 2, 3, 4, 5, 6, 7])
>>> numpy.percentile(a, 25)
2.5
>>> numpy.percentile(a, 75)
5.5
>>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # IQR
3.0

My attempt in Wolfram Alpha is as follows:

So, I find that the values returned by NumPy and Wolfram Alpha for what I think are the first quartile, the third quartile and the interquartile range are not consistent. Why is this? What should I be doing in Python to calculate the interquartile range correctly?

As far as I am aware, the interquartile range of [1, 2, 3, 4, 5, 6, 7] should be the following:

median(5, 6, 7) - median(1, 2, 3) = 4.
Inbound answered 14/12, 2014 at 18:5 Comment(0)
W
9

You have 7 numbers which you are attempting to split into quartiles. Because 7 is not divisible by 4 there are a couple of different ways to do this as mentioned here.

Your way is the first given by that link, wolfram alpha seems to be using the third. Numpy is doing basically the same thing as wolfram however its interpolating based on percentiles (as shown here) rather than quartiles so its getting a different answer. You can choose how numpy handles this using the interpolation option (I tried to link to the documentation but apparently I'm only allowed two links per post).

You'll have to choose which definition you prefer for your application.

Wernick answered 14/12, 2014 at 18:28 Comment(0)
D
10

Version 1.9 of numpy features a handy 'interpolation' argument to help you get to 4.

a = numpy.array([1, 2, 3, 4, 5, 6, 7])
numpy.percentile(a, 75, interpolation='higher') - numpy.percentile(a, 25, interpolation='lower')
Deflocculate answered 14/12, 2014 at 18:31 Comment(1)
Thank you very much for your code assistance. I'll check out your approach. If I could accept your answer too, I would -- or1426 provided some more detail that helped clarify what was happening.Inbound
W
9

You have 7 numbers which you are attempting to split into quartiles. Because 7 is not divisible by 4 there are a couple of different ways to do this as mentioned here.

Your way is the first given by that link, wolfram alpha seems to be using the third. Numpy is doing basically the same thing as wolfram however its interpolating based on percentiles (as shown here) rather than quartiles so its getting a different answer. You can choose how numpy handles this using the interpolation option (I tried to link to the documentation but apparently I'm only allowed two links per post).

You'll have to choose which definition you prefer for your application.

Wernick answered 14/12, 2014 at 18:28 Comment(0)
N
1

Not perfect but these functions should approximate it:

def quartile_1(l):
    return sorted(l)[int(len(l) * .25)]

def median(l):
    return sorted(l)[len(l)/2]

def quartile_3(l):
    return sorted(l)[int(len(l) * .75)]
Nankeen answered 19/8, 2015 at 17:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.