Extract outliers from Seaborn Boxplot
Asked Answered
B

2

11

Is there a way to extract all outliers after plotting a Seaborn Boxplot? For example, if I am plotting a boxplot for the below data

      client                total
1      LA                     1
2      Sultan                128
3      ElderCare              1
4      CA                     3
5      More                  900

I want to see the below records returned as outliers after the boxplot is plotted.

2      Sultan                128
5      More                  900
Burmeister answered 12/12, 2018 at 3:24 Comment(0)
G
16

Seaborn uses matplotlib to handle outlier calculations, meaning the key parameter, whis, is passed onto ax.boxplot. The specific function taking care of the calculation is documented here: https://matplotlib.org/api/cbook_api.html#matplotlib.cbook.boxplot_stats. You can use matplotlib.cbook.boxplot_stats to calculate rather than extract outliers. The follow code snippet shows you the calculation and how it is the same as the seaborn plot:

import matplotlib.pyplot as plt
from matplotlib.cbook import boxplot_stats
import pandas as pd
import seaborn as sns

data = [
    ('LA', 1),
    ('Sultan', 128),
    ('ElderCare', 1),
    ('CA', 3),
    ('More', 900),
]
df = pd.DataFrame(data, columns=('client', 'total'))
ax = sns.boxplot(data=df)
outliers = [y for stat in boxplot_stats(df['total']) for y in stat['fliers']]
print(outliers)
for y in outliers:
    ax.plot(1, y, 'p')
ax.set_xlim(right=1.5)
plt.show()

enter image description here

Galer answered 12/12, 2018 at 6:6 Comment(0)
A
6

The code below will give you an array of outliers use it to extract values from the dataframe.

from matplotlib.cbook import boxplot_stats  
boxplot_stats(df.colname).pop(0)['fliers']
Aec answered 29/7, 2019 at 17:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.