EDIT: this question arose back in 2013 with pandas ~0.13 and was obsoleted by direct support for boxplot somewhere between version 0.15-0.18 (as per @Cireo's late answer; also pandas greatly improved support for categorical since this was asked.)
I can get a boxplot
of a salary column in a pandas DataFrame...
train.boxplot(column='Salary', by='Category', sym='')
...however I can't figure out how to define the index-order used on column 'Category' - I want to supply my own custom order, according to another criterion:
category_order_by_mean_salary = train.groupby('Category')['Salary'].mean().order().keys()
How can I apply my custom column order to the boxplot columns? (other than ugly kludging the column names with a prefix to force ordering)
'Category' is a string (really, should be a categorical, but this was back in 0.13, where categorical was a third-class citizen) column taking 27 distinct values: ['Accounting & Finance Jobs','Admin Jobs',...,'Travel Jobs']
. So it can be easily factorized with pd.Categorical.from_array()
On inspection, the limitation is inside pandas.tools.plotting.py:boxplot()
, which converts the column object without allowing ordering:
- pandas.core.frame.py.boxplot() is a passthrough to
- pandas.tools.plotting.py:boxplot() which instantiates ...
- matplotlib.pyplot.py:boxplot() which instantiates ...
- matplotlib.axes.py:boxplot()
I suppose I could either hack up a custom version of pandas boxplot(), or reach into the internals of the object. And also file an enhance request.