- To get the boxplot data, use
matplotlib.cbook.boxplot_stats
, which returns a list of dictionaries of statistics used to draw a series of box and whisker plots using matplotlib.axes.Axes.bxp
- To get the boxplot statistics, pass an
array
to boxplot_stats
.
- This is not specific to
pandas
.
- The default plot engine for
pandas
, is matplotlib
, so using boxplot_stats
will return the correct metrics for pandas.DataFrame.plot.box
.
- Pass the numeric columns of interest, to
boxplot_stats
, as an array
, using df.values
.
- There can be no
NaN
values in the columns.
- Tested in
python 3.11.4
, pandas 2.1.0
, matplotlib 3.7.2
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import numpy as np
# test dataframe
np.random.seed(346)
df = pd.DataFrame(np.random.rand(100, 5), columns=['A', 'B', 'C', 'D', 'E'])
# plot the dataframe as needed
ax = df.plot.box(figsize=(8, 6), showmeans=True, grid=True)
- Extract the boxplot metrics by passing an
array
to boxplot_stats
boxplot_stats(df)
or boxplot_stats(df.values)
will work.
- The
dicts
are in the same order as the column arrays from df
.
- This data had no outliers,
fliers
, because it was generated with numpy.random
.
# create a dict of dicts with the column names as the keyword for each dict of statistics
stats = dict(zip(df.columns, boxplot_stats(df)))
print(stats)
[out]:
{'A': {'cihi': 0.6008396701195271,
'cilo': 0.45316512285356997,
'fliers': array([], dtype=float64),
'iqr': 0.47030110594253877,
'mean': 0.49412631128104645,
'med': 0.5270023964865486,
'q1': 0.2603486498337239,
'q3': 0.7306497557762627,
'whishi': 0.9941975539538199,
'whislo': 0.00892072823759571},
'B': {'cihi': 0.5460977498205477,
'cilo': 0.39283808760835964,
'fliers': array([], dtype=float64),
'iqr': 0.4880880962171596,
'mean': 0.47578540593013985,
'med': 0.4694679187144537,
'q1': 0.2466015651284032,
'q3': 0.7346896613455628,
'whishi': 0.9906905357196321,
'whislo': 0.002613905425137064},
'C': {'cihi': 0.6327876179340386,
'cilo': 0.47317829117336885,
'fliers': array([], dtype=float64),
'iqr': 0.5083099578365278,
'mean': 0.5202481643792808,
'med': 0.5529829545537037,
'q1': 0.24608370844800756,
'q3': 0.7543936662845353,
'whishi': 0.9968264819096214,
'whislo': 0.008450848029956215},
'D': {'cihi': 0.5429786764060252,
'cilo': 0.40089287519667627,
'fliers': array([], dtype=float64),
'iqr': 0.4525025516221303,
'mean': 0.4948030963370377,
'med': 0.4719357758013507,
'q1': 0.279181107815125,
'q3': 0.7316836594372553,
'whishi': 0.9836196084903415,
'whislo': 0.019864664399723786},
'E': {'cihi': 0.5413819754851169,
'cilo': 0.3838462046931251,
'fliers': array([], dtype=float64),
'iqr': 0.5017062764076173,
'mean': 0.4922357500877824,
'med': 0.462614090089121,
'q1': 0.2490034171367362,
'q3': 0.7507096935443536,
'whishi': 0.9984043081918205,
'whislo': 0.0036707224412856343}}