Display regression equation in seaborn regplot [duplicate]
Asked Answered
T

2

58

Does anyone know how to display the regression equation in seaborn using sns.regplot or sns.jointplot? regplot doesn't seem to have any parameter that you can be pass to display regression diagnostics, and jointplot only displays the pearson R^2, and p-value. I'm looking for a way to see the slope coefficient, standard error, and intercept as well.

Thanks

Tragicomedy answered 3/11, 2015 at 3:20 Comment(3)
I think you'll need to do the regression yourself to get that info.Procrustes
Yes, and add it with ax.text.Cloverleaf
Clearly the information should be somewhere since a line is being drawnQuietism
M
65

In 2015, the lead developer for seaborn replied to a feature request asking for access to the statistical values used to generate plots by saying, "It is not available, and it will not be made available."

So, unfortunately, this feature does not exist in seaborn, and seems unlikely to exist in the future.

Update: in March 2018, seaborn's lead developer reiterated his opposition to this feature. He seems... uninterested in further discussion.

Midshipmite answered 1/11, 2017 at 19:18 Comment(8)
Thanks for finding this. I added the slope and intercept to the title of the plot to get around this.Heideheidegger
Wow, the devs are so utterly wrong on this point in my professional opinion (and quite rude about it too). Not having access to the underlying statistical models makes seaborn unsuitable for serious scientific visualization. The answer to any question about using seaborn for publication quality figures is now "just don't use it". Sad, since it is pretty nice otherwise.Panjabi
"it is out of scope because seaborn is a library for visualization, not for statistics (statsmodels) or data munging (pandas)" ... It is data we are visualising, correct? And lines created by statistics... something especially the case when Seaborn calls statsmodels for its own estimates?Deckle
I see the maintainer's point. If the regression is only used for visualisation, he can get away with various shortcuts to e.g. make the plots draw faster. But if users start relying on his code for generating numbers for published papers, suddenly the responsibility on his shoulders increases. It's OK not to want such responsibility (though he could have been nicer in the latest reply). Also, I suspect he doesn't want to encourage the slap-dash kind of statistical testing "draw a graph, read off the R^2, claim a result if it's higher than X".Livelihood
m, b = np.polyfit(x, y, 1) may helpTachograph
seaborn is an api for matplotlib, not a stats package. However you should see seaborn: Visualizing regression models, which states -*To obtain quantitative measures related to the fit of regression models, you should use statsmodels.__.Wavellite
It makes total sense not to supply this kind of info since it can be calculated by the more suitable statsmodels package, whose sole purpose in life is to do such things and do them right. seaborn is - as rightly put by others - a visualization library and therefore should not be relied upon when trying to obtain a statistical model. One can use both, seaborn and statsmodels, and make them work together to one's advantage. That makes a lot of sense to me.Revisal
Completely agree with @travc. Saying "because visualisation" it's out of scope seems myopicCheckoff
J
30

A late and partial answer. I had the problem of just wanting to get the data of the regression line and I found this:

When you have this plot:

f = mp.figure()
ax = f.add_subplot(1,1,1)
p = sns.regplot(x=dat.x,y=ydat,data=dat,ax=ax)

Then p has a method get_lines() which gives back a list of line2D objects. And a line2D object has methods to get the desired data:

So to get the linear regression data in this example, you just need to do this:

p.get_lines()[0].get_xdata()
p.get_lines()[0].get_ydata()

Those calls return each a numpy array of the regression line data points which you can use freely.

Using p.get_children() you get a list of the individual elements of the plot.

The path information of the confidence interval plot can be found with:

p.get_children()[1].get_paths()

It's in the form of tuples of data points.

Generally a lot can be found by using the dir() command on any Python object, it just shows everything that's in there.

Jilli answered 8/9, 2016 at 10:11 Comment(4)
This doesn't directly yield the desired equation; desired is slope and intercept of the regression line. i.e., a and b for y = ax + b. However, to get this one could use scipys stats.linregress: slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x=p.get_lines()[0].get_xdata(),y=p.get_lines()[0].get_ydata())Petal
The equation can easily be calculated using the (x,y) coordinates of two of the points. With two points you can calculate a and with that b.Jilli
@Jilli ok but how weird it is that there's a piece of software that computes a regression, gives you the correlation coefficient and p-value of the resulting model, and does not provide the model itself? It would be great if seaborn authors could add this feature,Uproar
How to get the coefficients of the equation "a*x + b"?Dominique

© 2022 - 2024 — McMap. All rights reserved.