How to overlay a Seaborn jointplot with a "marginal" (distribution histogram) from a different dataset
Asked Answered
P

4

8

I have plotted a Seaborn JointPlot from a set of "observed counts vs concentration" which are stored in a pandas DataFrame. I would like to overlay (on the same set of axes) a marginal (ie: univariate distribution) of the "expected counts" for each concentration on top of the existing marginal, so that the difference can be easily compared.

This graph is very similar to what I want, although it will have different axes and only two datasets:

Here is an example of how my data is laid out and related:

df_observed

x axis--> log2(concentration): 1,1,1,2,3,3,3 (zero-counts have been omitted)

y axis--> log2(count): 4.5, 5.7, 5.0, 9.3, 16.0, 16.5, 15.4 (zero-counts have been omitted)

df_expected

x axis--> log2(concentration): 1,1,1,2,2,2,3,3,3

an overlaying of the distribution of df_expected on top of that of df_observed would therefore indicate where there were counts missing at each concentration.

What I currently have

Jointplot with the observed counts at each concentration Separate jointplot of the expected counts at each concentration. I want the marginal from this plot to be overlaid on top of the marginal from the above jointplot

PS: I am new to Stack Overflow so any suggestions about how to better ask questions will be met with gratitude. Also, I have searched extensively for an answer to my question but to no avail. In addition, a Plotly solution would be equally helpful. Thank you

Panada answered 10/3, 2016 at 15:37 Comment(3)
Can you put a picture of what you already have? Just to give me a better idea for your case.Maggot
Thank you for your reply. I added pictures of the two separate plots I wish to combine into one. Sorry for the poor explanation, I realise it is not as clear as one would like.Panada
Do you have sample data? How is your data structured? A sample dataframe might be good.Maggot
M
8

Whenever I try to modify a JointPlot more than for what it was intended for, I turn to a JointGrid instead. It allows you to change the parameters of the plots in the marginals.

Below is an example of a working JointGrid where I add another histogram for each marginal. These histograms represent the expected value that you wanted to add. Keep in mind that I generated random data so it probably doesn't look like yours.

enter image description here

Take a look at the code, where I altered the range of each second histogram to match the range from the observed data.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(100,4), columns = ['x', 'y', 'z', 'w'])

plt.ion()
plt.show()
plt.pause(0.001)

p = sns.JointGrid(
    x = df['x'],
    y = df['y']
    )

p = p.plot_joint(
    plt.scatter
    )

p.ax_marg_x.hist(
    df['x'],
    alpha = 0.5
    )

p.ax_marg_y.hist(
    df['y'],
    orientation = 'horizontal',
    alpha = 0.5
    )

p.ax_marg_x.hist(
    df['z'],
    alpha = 0.5,
    range = (np.min(df['x']), np.max(df['x']))
    )

p.ax_marg_y.hist(
    df['w'],
    orientation = 'horizontal',
    alpha = 0.5,
    range = (np.min(df['y']), np.max(df['y'])),
    )

The part where I call plt.ion plt.show plt.pause is what I use to display the figure. Otherwise, no figure appears on my computer. You might not need this part.

Welcome to Stack Overflow!

Maggot answered 15/3, 2016 at 2:47 Comment(2)
Thank you blue_chip! That was exactly what I was looking for... I am extremely grateful for the help, especially considering my poor explanation. Also, thanks to @mwaskom for the input.Panada
For those having too many points, you can change the size s of the circles with p = p.plot_joint(lambda x, y: plt.scatter(x=x, y=y, s=1.5))Kame
S
15

Wrote a function to plot it, very loosly based on @blue_chip's idea. You might still need to tweak it a bit for your specific needs.

Here is an example usage:

enter image description here

Example data:

import seaborn as sns, numpy as np, matplotlib.pyplot as plt, pandas as pd
n=1000
m1=-3
m2=3

df1 = pd.DataFrame((np.random.randn(n)+m1).reshape(-1,2), columns=['x','y'])
df2 = pd.DataFrame((np.random.randn(n)+m2).reshape(-1,2), columns=['x','y'])
df3 = pd.DataFrame(df1.values+df2.values, columns=['x','y'])
df1['kind'] = 'dist1'
df2['kind'] = 'dist2'
df3['kind'] = 'dist1+dist2'
df=pd.concat([df1,df2,df3])

Function definition:

def multivariateGrid(col_x, col_y, col_k, df, k_is_color=False, scatter_alpha=.5):
    def colored_scatter(x, y, c=None):
        def scatter(*args, **kwargs):
            args = (x, y)
            if c is not None:
                kwargs['c'] = c
            kwargs['alpha'] = scatter_alpha
            plt.scatter(*args, **kwargs)

        return scatter

    g = sns.JointGrid(
        x=col_x,
        y=col_y,
        data=df
    )
    color = None
    legends=[]
    for name, df_group in df.groupby(col_k):
        legends.append(name)
        if k_is_color:
            color=name
        g.plot_joint(
            colored_scatter(df_group[col_x],df_group[col_y],color),
        )
        sns.distplot(
            df_group[col_x].values,
            ax=g.ax_marg_x,
            color=color,
        )
        sns.distplot(
            df_group[col_y].values,
            ax=g.ax_marg_y,
            color=color,            
            vertical=True
        )
    # Do also global Hist:
    sns.distplot(
        df[col_x].values,
        ax=g.ax_marg_x,
        color='grey'
    )
    sns.distplot(
        df[col_y].values.ravel(),
        ax=g.ax_marg_y,
        color='grey',
        vertical=True
    )
    plt.legend(legends)
    

Usage:

multivariateGrid('x', 'y', 'kind', df=df)
Standstill answered 14/3, 2019 at 14:56 Comment(2)
This is great. But why does replacing the inner scatter function with just plt.scatter or sns.scatterplot result in the colours all being the same?Acoustics
Wait, figured it out, it's a closure over the grouped x and y, discarding the *args passed to the function that plot_joint calls inside itself.Acoustics
M
8

Whenever I try to modify a JointPlot more than for what it was intended for, I turn to a JointGrid instead. It allows you to change the parameters of the plots in the marginals.

Below is an example of a working JointGrid where I add another histogram for each marginal. These histograms represent the expected value that you wanted to add. Keep in mind that I generated random data so it probably doesn't look like yours.

enter image description here

Take a look at the code, where I altered the range of each second histogram to match the range from the observed data.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randn(100,4), columns = ['x', 'y', 'z', 'w'])

plt.ion()
plt.show()
plt.pause(0.001)

p = sns.JointGrid(
    x = df['x'],
    y = df['y']
    )

p = p.plot_joint(
    plt.scatter
    )

p.ax_marg_x.hist(
    df['x'],
    alpha = 0.5
    )

p.ax_marg_y.hist(
    df['y'],
    orientation = 'horizontal',
    alpha = 0.5
    )

p.ax_marg_x.hist(
    df['z'],
    alpha = 0.5,
    range = (np.min(df['x']), np.max(df['x']))
    )

p.ax_marg_y.hist(
    df['w'],
    orientation = 'horizontal',
    alpha = 0.5,
    range = (np.min(df['y']), np.max(df['y'])),
    )

The part where I call plt.ion plt.show plt.pause is what I use to display the figure. Otherwise, no figure appears on my computer. You might not need this part.

Welcome to Stack Overflow!

Maggot answered 15/3, 2016 at 2:47 Comment(2)
Thank you blue_chip! That was exactly what I was looking for... I am extremely grateful for the help, especially considering my poor explanation. Also, thanks to @mwaskom for the input.Panada
For those having too many points, you can change the size s of the circles with p = p.plot_joint(lambda x, y: plt.scatter(x=x, y=y, s=1.5))Kame
S
0

You can plot directly onto the JointGrid.ax_marg_x and JointGrid.ax_marg_y attributes, which are the underlying matplotlib axes.

Surfbird answered 12/3, 2016 at 16:0 Comment(1)
Thank you for your reply, I see how that might be the solution I am looking for. I am having trouble implementing that, however. Could you please explain how to go about that? I have tried the following: `g = sns.jointplot(x=result['log2(Concentration[attomoles/ul])'], y=result['log2(tpm)'], kind="reg", xlim=(-7,15), marginal_kws=dict(bins=22, kde=True), stat_func=r2, color="g", size=8) g.ax_marg_x=concentrations_expected['log2(Concentration[attomoles/ul])']Panada
C
0

It's now in Seaborn 0.11 with the hue parameter:

sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")

enter image description here

Charie answered 30/3, 2021 at 9:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.