How to apply custom column order (on Categorical) to pandas boxplot?

Asked 21/3, 2013 at 7:9 Answered 12/5, 2020 at 13:58

Solved python pandas boxplot categorical-data

EDIT: this question arose back in 2013 with pandas ~0.13 and was obsoleted by direct support for boxplot somewhere between version 0.15-0.18 (as per @Cireo's late answer; also pandas greatly improved support for categorical since this was asked.)

I can get a boxplot of a salary column in a pandas DataFrame...

train.boxplot(column='Salary', by='Category', sym='')

...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:

category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()

How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)

'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']. So it can be easily factorized with pd.Categorical.from_array()

On inspection, the limitation is inside pandas.tools.plotting.py:boxplot(), which converts the column object without allowing ordering:

pandas.core.frame.py.boxplot() is a passthrough to
pandas.tools.plotting.py:boxplot() which instantiates ...
matplotlib.pyplot.py:boxplot() which instantiates ...
matplotlib.axes.py:boxplot()

I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.

Birth answered 21/3, 2013 at 7:9 Comment(0)

Hard to say how to do this without a working example. My first guess would be to just add an integer column with the orders that you want.

A simple, brute-force way would be to add each boxplot one at a time.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))
columns_my_order = ['C', 'A', 'D', 'B']
fig, ax = plt.subplots()
for position, column in enumerate(columns_my_order):
    ax.boxplot(df[column], positions=[position])

ax.set_xticks(range(position+1))
ax.set_xticklabels(columns_my_order)
ax.set_xlim(xmin=-0.5)
plt.show()

Operatic answered 21/3, 2013 at 15:34 Comment(4)

Added details for you, and ideas on workarounds. Adding a separate standalone integer column doesn't give a decent graph because now your column labels are (illegible) integers, not text. (Kludging a text prefix into Category names to force custom sort-order is maybe the fastest hack. But still ugly) – Birth 21/3, 2013 at 20:33

pandas DataFrame cannot handle a Categorical column, unlike R. – Birth 21/3, 2013 at 20:40

not where I was headed. I typically just use apply with a hard-coded lookup table. see my edited response for a different approach, though. – Operatic 21/3, 2013 at 21:0

Duh! Why didn't I think of that! Good idea. – Birth 21/3, 2013 at 21:34

EDIT: this is the right answer after direct support was added somewhere between version 0.15-0.18

tl;dr: for recent pandas - use positions argument to boxplot.

Adding a separate answer, which perhaps could be another question - feedback appreciated.

I wanted to add a custom column order within a groupby, which posed many problems for me. In the end, I had to avoid trying to use boxplot from a groupby object, and instead go through each subplot myself to provide explicit positions.

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame()
df['GroupBy'] = ['g1', 'g2', 'g3', 'g4'] * 6
df['PlotBy'] = [chr(ord('A') + i) for i in xrange(24)]
df['SortBy'] = list(reversed(range(24)))
df['Data'] = [i * 10 for i in xrange(24)]

# Note that this has no effect on the boxplot
df = df.sort_values(['GroupBy', 'SortBy'])
for group, info in df.groupby('GroupBy'):
    print 'Group: %r\n%s\n' % (group, info)

# With the below, cannot use
#  - sort data beforehand (not preserved, can't access in groupby)
#  - categorical (not all present in every chart)
#  - positional (different lengths and sort orders per group)
# df.groupby('GroupBy').boxplot(layout=(1, 5), column=['Data'], by=['PlotBy'])

fig, axes = plt.subplots(1, df.GroupBy.nunique(), sharey=True)
for ax, (g, d) in zip(axes, df.groupby('GroupBy')):
    d.boxplot(column=['Data'], by=['PlotBy'], ax=ax, positions=d.index.values)
plt.show()

Within my final code, it was even slightly more involved to determine positions because I had multiple data points for each sortby value, and I ended up having to do the below:

to_plot = data.sort_values([sort_col]).groupby(group_col)
for ax, (group, group_data) in zip(axes, to_plot):
    # Use existing sorting
    ordering = enumerate(group_data[sort_col].unique())
    positions = [ind for val, ind in sorted((v, i) for (i, v) in ordering)]
    ax = group_data.boxplot(column=[col], by=[plot_by], ax=ax, positions=positions)

Sweven answered 18/5, 2017 at 22:48 Comment(3)

Well the original question's been closed for years, why not add a new question for this answer? Specify pandas 0.20+ – Birth 29/5, 2017 at 4:52

Wasn't sure as to the etiquette of posting a question then answering it yourself =/ – Sweven 30/5, 2017 at 1:11

that's perfectly ok. Also in this case desirable - this question has become obsolete at some point by pandas 0.19 – Birth 30/5, 2017 at 5:13

Actually I got stuck with the same question. And I solved it by making a map and reset the xticklabels, with code as follows:

df = pd.DataFrame({"A":["d","c","d","c",'d','c','a','c','a','c','a','c']})
df['val']=(np.random.rand(12))
df['B']=df['A'].replace({'d':'0','c':'1','a':'2'})
ax=df.boxplot(column='val',by='B')
ax.set_xticklabels(list('dca'))

Avenge answered 18/4, 2018 at 6:12 Comment(1)

Please note that set_xticklabels() will give a wrong result as it's just overwriting the existing labels. set_xticklabels(list('dca')) is not moving the value of label d' to first place as you and OP intended instead, re-labelling whatever the first label was as 'd' – Munt 31/5, 2019 at 18:32

Note that pandas can now create categorical columns. If you don't mind having all the columns present in your graph, or trimming them appropriately, you can do something like the below:

http://pandas.pydata.org/pandas-docs/stable/categorical.html

df['Category'] = df['Category'].astype('category', ordered=True)

Recent pandas also appears to allow positions to pass all the way through from frame to axes.

Sweven answered 18/5, 2017 at 20:48 Comment(2)

The link said 0.15, but I was suspicious of that. I'm not sure if the feature was fully integrated at that point. I was able to do all of this in 0.19.2 – Sweven 30/5, 2017 at 1:12

Thanks for checking. – Birth 1/6, 2017 at 22:42

It might sound kind of silly, but many of the plot allow you to determine the order. For example:

Library & dataset

import seaborn as sns
df = sns.load_dataset('iris')

Specific order

p1=sns.boxplot(x='species', y='sepal_length', data=df, order=["virginica", "versicolor", "setosa"])
sns.plt.show()

Cheroot answered 23/7, 2019 at 18:24 Comment(1)

Not silly at all, this is the perfect solution for people using seaborn. – Value 4/10, 2022 at 16:51

If you're not happy with the default column order in your boxplot, you can change it to a specific order by setting the column parameter in the boxplot function.

check the two examples below:

np.random.seed(0)
df = pd.DataFrame(np.random.rand(37,4), columns=list('ABCD'))

##
plt.figure()
df.boxplot()
plt.title("default column order")

##
plt.figure()
df.boxplot(column=['C','A', 'D', 'B'])
plt.title("Specified column order")

Ichnite answered 6/12, 2019 at 15:12 Comment(0)

Use the new positions= attribute:

df.boxplot(column=['Data'], by=['PlotBy'], positions=df.index.values)

Interlink answered 9/3, 2020 at 10:27 Comment(1)

cc: @Sweven you might like to edit your answer for clarity – Birth 9/3, 2020 at 10:32

This can be resolved by applying a categorical order. You can decide on the ranking yourself. I'll give an example with days of week.

Provide categorical order to weekday

#List categorical variables in correct order
weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
#Assign the above list to category ranking
wDays = pd.api.types.CategoricalDtype(ordered= True, categories=Weekday)
#Apply this to the specific column in DataFrame
df['Weekday'] = df['Weekday'].astype(wDays)
# Then generate your plot
plt.figure(figsize = [15, 10])
sns.boxplot(data = flights_samp, x = 'Weekday', y = 'Y Axis Variable', color = colour)

Kingfisher answered 12/5, 2020 at 13:58 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags