Plotting errors bars from dataframe using Seaborn FacetGrid
Asked Answered
L

2

13

I want to plot error bars from a column in a pandas dataframe on a Seaborn FacetGrid

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar']*2,
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                  'C' : np.random.randn(8),
                  'D' : np.random.randn(8)})
df

Example dataframe

    A       B        C           D
0   foo     one      0.445827   -0.311863
1   bar     one      0.862154   -0.229065
2   foo     two      0.290981   -0.835301
3   bar     three    0.995732    0.356807
4   foo     two      0.029311    0.631812
5   bar     two      0.023164   -0.468248
6   foo     one     -1.568248    2.508461
7   bar     three   -0.407807    0.319404

This code works for fixed size error bars:

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D",yerr=0.5, fmt='o');

enter image description here

But I can't get it to work using values from the dataframe

df['E'] = abs(df['D']*0.5)
g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr=df['E']);

or

g = sns.FacetGrid(df, col="A", hue="B", size =5)
g.map(plt.errorbar, "C", "D", yerr='E');

both produce screeds of errors

EDIT:

After lots of matplotlib doc reading, and assorted stackoverflow answers, here is a pure matplotlib solution

#define a color palette index based on column 'B'
df['cind'] = pd.Categorical(df['B']).labels

#how many categories in column 'A'
cats = df['A'].unique()
cats.sort()

#get the seaborn colour palette and convert to array
cp = sns.color_palette()
cpa = np.array(cp)

#draw a subplot for each category in column "A"
fig, axs = plt.subplots(nrows=1, ncols=len(cats), sharey=True)
for i,ax in enumerate(axs):
    df_sub = df[df['A'] == cats[i]]
    col = cpa[df_sub['cind']]
    ax.scatter(df_sub['C'], df_sub['D'], c=col)
    eb = ax.errorbar(df_sub['C'], df_sub['D'], yerr=df_sub['E'], fmt=None)
    a, (b, c), (d,) = eb.lines
    d.set_color(col)

Other than the labels, and axis limits its OK. Its plotted a separate subplot for each category in column 'A', colored by the category in column 'B'. (Note the random data is different to that above)

I'd still like a pandas/seaborn solution if anyone has any ideas?

enter image description here

Linotype answered 22/7, 2014 at 2:48 Comment(0)
F
14

When using FacetGrid.map, anything that refers to the data DataFrame must be passed as a positional argument. This will work in your case because yerr is the third positional argument for plt.errorbar, though to demonstrate I'm going to use the tips dataset:

from scipy import stats
tips_all = sns.load_dataset("tips")
tips_grouped = tips_all.groupby(["smoker", "size"])
tips = tips_grouped.mean()
tips["CI"] = tips_grouped.total_bill.apply(stats.sem) * 1.96
tips.reset_index(inplace=True)

I can then plot using FacetGrid and errorbar:

g = sns.FacetGrid(tips, col="smoker", size=5)
g.map(plt.errorbar, "size", "total_bill", "CI", marker="o")

enter image description here

However, keep in mind that the there are seaborn plotting functions for going from a full dataset to plots with errorbars (using bootstrapping), so for a lot of applications this may not be necessary. For example, you could use factorplot:

sns.factorplot("size", "total_bill", col="smoker",
               data=tips_all, kind="point")

enter image description here

Or lmplot:

sns.lmplot("size", "total_bill", col="smoker",
           data=tips_all, fit_reg=False, x_estimator=np.mean)

enter image description here

Filibeg answered 22/7, 2014 at 14:47 Comment(4)
The positional argument bit was the key. In a measurement context the type A uncertainties (statistical) are easy to evaluate in factorplot, lmplot, although one has to dive into the api documentation to check exactly what measure of the data spread is being plotted and how its calculated (68% confidence limits via bootstrap?). It would be good if this was more upfront in the docs. I need to plot the type B uncertainties which I can do as shown here. ThanksLinotype
The default CI is 95% (you can see in the function signature) but they all take a ci keyword argument that you can set to 68% if you want the standard error.Filibeg
@Filibeg Is there a solution for asymmetric error bars? Imagine I have two columns of a dataframe giving CI min / max. Is there a way to pass that to plt.errorbar via g.map?Wimble
You should be able to write a wrapper function that takes vectors (x, y, err_lower, err_upper) and calls plt.errorbar correctly.Filibeg
W
1

You aren't showing what df['E'] actually is, and if it is a list of the same length as df['C'] and df['D'].

The yerr keyword argument (kwarg) takes either a single value that will be applied for every element in the lists for keys C and D from the dataframe, or it needs a list of values the same length as those lists.

So, C, D, and E must all be associated with lists of the same length, or C and D must be lists of the same length and E must be associated with a single float or int. If that single float or int is inside a list, you must extract it, like df['E'][0].

Example matplotlib code with yerr: http://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html

Bar plot API documentation describing yerr: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar

Wearable answered 22/7, 2014 at 3:13 Comment(1)
df['E'] = abs(df['D']*0.5), in the first line of the 4th code block. I think the problem is that the map function from seaborn is passing the whole df['E'] list to matplotlib's errorbar function, not just the part that applies to that subplot.Linotype

© 2022 - 2024 — McMap. All rights reserved.