is seaborn confidence interval computed correctly?

Asked 8/9, 2017 at 22:15 Answered 14/6, 2018 at 14:14

python matplotlib seaborn boxplot confidence-interval

First, I must admit that my statistics knowledge is rusty at best: even when it was shining new, it's not a discipline I particularly liked, which means I had a hard time making sense of it.

Nevertheless, I took a look at how the barplot graphs were calculating error bars, and was surprised to find a "confidence interval" (CI) used instead of (the more common) standard deviation. Researching more CI led me to this wikipedia article which seems to say that, basically, a CI is computed as:

mean minus 1.96 times stdev over sqrt(n)

mean plus 1.96 times stdev over sqrt(n)

Or, in pseudocode:

def ci_wp(a):
    """calculate confidence interval using Wikipedia's formula"""
    m = np.mean(a)
    s = 1.96*np.std(a)/np.sqrt(len(a))
    return m - s, m + s

But what we find in seaborn/utils.py is:

def ci(a, which=95, axis=None):
    """Return a percentile range from an array of values."""
    p = 50 - which / 2, 50 + which / 2
    return percentiles(a, p, axis)

Now maybe I'm missing this completely, but this seems just like a completely different calculation than the one proposed by Wikipedia. Can anyone explain this discrepancy?

To give another example, from comments, why do we get so different results between:

 >>> sb.utils.ci(np.arange(100))
 array([ 2.475, 96.525])

 >>> ci_wp(np.arange(100))
 [43.842250270646467,55.157749729353533]

And to compare with other statistical tools:

 def ci_std(a):
     """calculate margin of error using standard deviation"""
     m = np.mean(a)
     s = np.std(a)
     return m-s, m+s

 def ci_sem(a):
     """calculate margin of error using standard error of the mean"""
     m = np.mean(a)
     s = sp.stats.sem(a)
     return m-s, m+s

Which gives us:

>>> ci_sem(np.arange(100))
(46.598850802411796, 52.401149197588204)

>>> ci_std(np.arange(100))
(20.633929952277882, 78.366070047722118)

Or with a random sample:

rng = np.random.RandomState(10)
a = rng.normal(size=100)
print sb.utils.ci(a)
print ci_wp(a)
print ci_sem(a)
print ci_std(a)

... which yields:

[-1.9667006   2.19502303]
(-0.1101230745774124, 0.26895640045116026)
(-0.017774461397903049, 0.17660778727165088)
(-0.88762281417683186, 1.0464561400505796)

Why are Seaborn's numbers so radically different from the other results?

Naphthol answered 8/9, 2017 at 22:15 Comment(2)

I confirm: sb.ci(np.arange(100)) gives array([ 2.475, 96.525]), the direct computation np.mean(np.arange(100))-np.arange(100).std()*1.96/10 gives [43.842250270646467,55.157749729353533]. – Abner 8/9, 2017 at 22:28

thanks! that sounds about right, although I must say: seaborn's result does seem to make more sense here... and for comparison, using plain mean +/- std gives: (20.633929952277882, 78.366070047722118) – Naphthol 8/9, 2017 at 23:11

Your calculation using this Wikipedia formula is completely right. Seaborn just uses another method: https://en.wikipedia.org/wiki/Bootstrapping_(statistics). It's well described by Dragicevic [1]:

[It] consists of generating many alternative datasets from the experimental data by randomly drawing observations with replacement. The variability across these datasets is assumed to approximate sampling error and is used to compute so-called bootstrap confidence intervals. [...] It is very versatile and works for many kinds of distributions.

In the Seaborn's source code, a barplot uses estimate_statistic which bootstraps the data then computes the confidence interval on it:

>>> sb.utils.ci(sb.algorithms.bootstrap(np.arange(100)))
array([43.91, 55.21025])

The result is consistent with your calculation.

[1] Dragicevic, P. (2016). Fair statistical communication in HCI. In Modern Statistical Methods for HCI (pp. 291-330). Springer, Cham.

Selfexecuting answered 14/6, 2018 at 14:14 Comment(1)

Does this work for small sample size though? I just dug into the code, it looks like it's sampling with replacement from your (small) sample 1000 times, computes the means for each sample and then computes the 2.5, 97.5 percentiles of the means. But what if my original sample size is small, like 10? Computing the 95% confidence intervals with t=2.262 gives me a wider range. – Autism 11/12, 2023 at 21:30

You need to check the code of percentiles. The seaborn ci code you posted simply computes the percentile limits. This interval has a defined mean of 50 (median) and a default range of 95% confidence interval. The actual mean, the standard deviation, etc. will appear in the percentiles routine.

Brucite answered 8/9, 2017 at 22:23 Comment(1)

percentiles eventually calls scipy.stats.scoreatpercentile - this still looks different than the wikipedia calculations... – Naphthol 8/9, 2017 at 22:54

Recommended topics

Hot tags