How to compute the median and 68% confidence interval around the median of non-Gaussian distribution in Python?
Asked Answered
M

2

5

I have a data set which is a numpy array say a=[a1,a2,.....] and also the weights of the data w=[w1,w2,w3...]. I have computed the histogram using numpy histogram package which gives me the hist array. Now I want to compute the median of this probability distribution function and also the 68% contour around the median. Remember my dataset is not Gaussian.

Can anyone help? I am using python.

Mcbee answered 6/10, 2016 at 11:0 Comment(9)
Look at this question.Trimurti
Just to confirm, for your data set, w1 gives how likely the value a1 is, etc.?Scutcheon
I noticed that you have not to upvoted and accepted answers to your questions even in cases where the answers "looked good to me." You will find people more willing to help if you recognize such answers! After all, you got free help, often from experts!Scutcheon
@Trimurti Thanks. But I wanted to avoid bootstrapping. Here is a package which actually computes the weighted mean of a distribution. weightedstats and then calculating the CL is easy with numpy.percentileMcbee
@UlrichStern I am new to this websiteMcbee
If the weights represent how likely the values are, you do not have a sample but a description of the population (discrete random variable in this case) and bootstrapping would not be the right thing to do. Re: confidence interval, the calculation is not difficult, but "just numpy.percentile" does not sound right. Will post answer a little later.Scutcheon
@UlrichStern What I did was computed the weighted median first and then divided my data array into two parts around the median, that's how the median is defined. Then I computed the 34% on both side of the median using numpy percentile. I will compare the result with the answer you posted.Mcbee
A manual computation of the 68% confidence interval should work as follows: create (ai, wi) pairs and sort pairs by ai. Then, starting from median, go 34% in sum(wi) in each direction. This would be an extension of the weighted median calculation on Wikipedia to confidence intervals. Edge cases/values need to be thought through (note that Wikipedia, e.g., has two cases for the median), so I was glad scipy.stats had a confidence interval calculation. :)Scutcheon
@ArpanDas : Ganesh GaitondeWrand
S
8

Here a solution using scipy.stats.rv_discrete:

from __future__ import division, print_function
import numpy as np, scipy.stats as st

# example data set
a = np.arange(20)
w = a + 1

# create custom discrete random variable from data set
rv = st.rv_discrete(values=(a, w/w.sum()))

# scipy.stats.rv_discrete has methods for median, confidence interval, etc.
print("median:", rv.median())
print("68% CI:", rv.interval(0.68))

Output reflects the uneven weights in the example data set:

median: 13.0
68% CI: (7.0, 18.0)
Scutcheon answered 14/10, 2016 at 20:48 Comment(1)
Thank you very much. This is really useful and more cleaner way of doing what exactly I needed.Mcbee
K
0

A simple option is NumPy's quantile function. Note, this works for the main title question, but it does not handle the more unique case with weighted values as presented later by the question asker.

quantile gives the value at a given quantile of the data. So to get the lower and upper bound for the common 68%, along with the median, you would use:

import numpy as np

lower, median, upper = np.quantile(the_array, [0.16, 0.50, 0.84])

NumPy's percentile function is equivalent, but with the percentile range as 0 - 100 instead of 0 - 1.

Kawai answered 3/4 at 3:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.