How to plot multiple linear regressions in the same figure

Asked 16/3, 2016 at 3:12 Answered 9/5, 2021 at 23:33

Solved python pandas matplotlib seaborn linear-regression

Given the following:

import numpy as np
import pandas as pd
import seaborn as sns

np.random.seed(365)
x1 = np.random.randn(50)
y1 = np.random.randn(50) * 100
x2 = np.random.randn(50)
y2 = np.random.randn(50) * 100

df1 = pd.DataFrame({'x1':x1, 'y1': y1})
df2 = pd.DataFrame({'x2':x2, 'y2': y2})

sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)

This will create 2 separate plots. How can I add the data from df2 onto the SAME graph? All the seaborn examples I have found online seem to focus on how you can create adjacent graphs (say, via the 'hue' and 'col_wrap' options). Also, I prefer not to use the dataset examples where an additional column might be present as this does not have a natural meaning in the project I am working on.

If there is a mixture of matplotlib/seaborn functions that are required to achieve this, I would be grateful if someone could help illustrate.

Immutable answered 16/3, 2016 at 3:12 Comment(0)

You could use seaborn's FacetGrid class to get desired result. You would need to replace your plotting calls with these lines:

# sns.lmplot('x1', 'y1', df1, fit_reg=True, ci = None)
# sns.lmplot('x2', 'y2', df2, fit_reg=True, ci = None)
df = pd.concat([df1.rename(columns={'x1':'x','y1':'y'})
                .join(pd.Series(['df1']*len(df1), name='df')), 
                df2.rename(columns={'x2':'x','y2':'y'})
                .join(pd.Series(['df2']*len(df2), name='df'))],
               ignore_index=True)

pal = dict(df1="red", df2="blue")
g = sns.FacetGrid(df, hue='df', palette=pal, size=5);
g.map(plt.scatter, "x", "y", s=50, alpha=.7, linewidth=.5, edgecolor="white")
g.map(sns.regplot, "x", "y", ci=None, robust=1)
g.add_legend();

This will yield this plot:

Which is if I understand correctly is what you need.

Note that you will need to pay attention to .regplot parameters and may want to change the values I have put as an example.

; at the end of the line is to suppress output of the command (I use ipython notebook where it's visible).
Docs give some explanation on the .map() method. In essence, it does just that, maps plotting command with data. However it will work with 'low-level' plotting commands like regplot, and not lmlplot, which is actually calling regplot behind the scene.
Normally plt.scatter would take parameters: c='none', edgecolor='r' to make non-filled markers. But seaborn is interfering the process and enforcing color to the markers, so I don't see an easy/straigtforward way to fix this, but to manipulate ax elements after seaborn has produced the plot, which is best to be addressed as part of a different question.

Eogene answered 16/3, 2016 at 9:33 Comment(0)

Option 1: `sns.regplot`

In this case, the easiest to implement solution is to use sns.regplot, which is an axes-level function, because this will not require combining df1 and df2.

import pandas as pd
import seaborn
import matplotlib.pyplot as plt

# create the figure and axes
fig, ax = plt.subplots(figsize=(6, 6))

# add the plots for each dataframe
sns.regplot(x='x1', y='y1', data=df1, fit_reg=True, ci=None, ax=ax, label='df1')
sns.regplot(x='x2', y='y2', data=df2, fit_reg=True, ci=None, ax=ax, label='df2')
ax.set(ylabel='y', xlabel='x')
ax.legend()
plt.show()

Option 2: `sns.lmplot`

As per sns.FacetGrid, it is better to use figure-level functions than to use FacetGrid directly.
Combine df1 and df2 into a long format, and then use sns.lmplot with the hue parameter.
When working with seaborn, it is almost always necessary for the data to be in a long format.
- It's customary to use pandas.DataFrame.stack or pandas.melt to convert DataFrames from wide to long.
- For this reason, df1 and df2 must have the columns renamed, and have an additional identifying column. This allows them to be concatenated on axis=0 (the default long format), instead of axis=1 (a wide format).
There are a number of ways to combine the DataFrames:
1. The combination method in the answer from Primer is fine if combining a few DataFrames.
2. However, a function, as shown below, is better for combining many DataFrames.

def fix_df(data: pd.DataFrame, name: str) -> pd.DataFrame:
    """rename columns and add a column"""
    # rename columns to a common name
    data.columns = ['x', 'y']
    # add an identifying value to use with hue
    data['df'] = name
    return data


# create a list of the dataframes
df_list = [df1, df2]

# update the dataframes by calling the function in a list comprehension
df_update_list = [fix_df(v, f'df{i}') for i, v in enumerate(df_list, 1)]

# combine the dataframes
df = pd.concat(df_update_list).reset_index(drop=True)

# plot the dataframe
sns.lmplot(data=df, x='x', y='y', hue='df', ci=None)

Notes

Package versions used for this answer:
- pandas v1.2.4
- seaborn v0.11.1
- matplotlib v3.3.4

Sigmoid answered 9/5, 2021 at 23:33 Comment(0)

Option 1: `sns.regplot`

Option 2: `sns.lmplot`

Notes

Recommended topics

Hot tags

Option 1: sns.regplot

Option 2: sns.lmplot

Notes

Recommended topics

Hot tags

Option 1: `sns.regplot`

Option 2: `sns.lmplot`