Process pandas dataframe into a violinplot

Asked 11/4, 2017 at 12:13 Answered 11/4, 2017 at 13:6

Solved python pandas matplotlib seaborn violin-plot

I have data I'm reading from an Excel spreadsheet. The data has a number of observations for each of six scenarios, S1 to S6. When I read in the data to my dataframe df, it looks as follows:

      Scenario        LMP
0           S1 -21.454544
1           S1 -20.778094
2           S1 -20.027689
3           S1 -19.747170
4           S1 -20.814405
5           S1 -21.955406
6           S1 -23.018960
...
12258       S6 -34.089906
12259       S6 -34.222814
12260       S6 -26.712010
12261       S6 -24.555973
12262       S6 -23.062616
12263       S6 -20.488411

I want to create a violinplot that has a different violin for each of the six scenarios. I'm new to Pandas and dataframes, and despite much research/testing over the last day, I can't quite figure out an elegant way to pass some reference(s) to my dataframe (to split it into different series for each scenario) that will work in the axes.violinplot() statement. For instance, I've tried the following, which doesn't work. I get a "ValueError: cannot copy sequence with size 1752 to array axis with dimension 2" on my axes.violinplot statement.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# load data into a dataframe
df = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)

fontsize = 10

fig, axes = plt.subplots()

axes.violinplot(dataset = [[df.loc[df.Scenario == 'S1']],
                           [df.loc[df.Scenario == 'S2']],
                           [df.loc[df.Scenario == 'S3']],
                           [df.loc[df.Scenario == 'S4']],
                           [df.loc[df.Scenario == 'S5']],
                           [df.loc[df.Scenario == 'S6']]
                          ]
                )
axes.set_title('Day Ahead Market')

axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

Seawards answered 11/4, 2017 at 12:13 Comment(0)

You need to be careful how to create the dataset to plot. In the code from the question you have a list of lists of one dataframe. However you need simply a list of one-column dataframes.

You would therefore also need to take only the "LMP" column from the filtered dataframes, otherwise the violinplot wouldn't know which column to plot.

Here is a working example which stays close to the original code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


x = np.random.poisson(lam =3, size=100)
y = np.random.choice(["S{}".format(i+1) for i in range(6)], size=len(x))
df = pd.DataFrame({"Scenario":y, "LMP":x})

fig, axes = plt.subplots()

axes.violinplot(dataset = [df[df.Scenario == 'S1']["LMP"].values,
                           df[df.Scenario == 'S2']["LMP"].values,
                           df[df.Scenario == 'S3']["LMP"].values,
                           df[df.Scenario == 'S4']["LMP"].values,
                           df[df.Scenario == 'S5']["LMP"].values,
                           df[df.Scenario == 'S6']["LMP"].values ] )

axes.set_title('Day Ahead Market')
axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

Peroration answered 11/4, 2017 at 13:6 Comment(1)

Thanks for the explanation. I think dataset = [group['LMP'].values for name , group in df.groupby('Scenario')] might be more concise. – Helgoland 8/8 at 6:59

You may use seaborn. In this case, import seaborn and then use violin plot to visualize the scenarios.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# load data into a dataframe
df = pd.read_excel('Modeling analysis charts.xlsx',
                   sheetname='lmps',
                   parse_cols=[7,12],
                   skiprows=0,
                   header=1)
fontsize = 10

fig, axes = plt.subplots()
# plot violin. 'Scenario' is according to x axis, 
# 'LMP' is y axis, data is your dataframe. ax - is axes instance
sns.violinplot('Scenario','LMP', data=df, ax = axes)
axes.set_title('Day Ahead Market')

axes.yaxis.grid(True)
axes.set_xlabel('Scenario')
axes.set_ylabel('LMP ($/MWh)')

plt.show()

Afra answered 11/4, 2017 at 12:33 Comment(0)

Recommended topics

Hot tags