Time-series boxplot in pandas
Asked Answered
M

3

13

How can I create a boxplot for a pandas time-series where I have a box for each day?

Sample dataset of hourly data where one box should consist of 24 values:

import pandas as pd
n = 480
ts = pd.Series(randn(n),
               index=pd.date_range(start="2014-02-01",
                                   periods=n,
                                   freq="H"))
ts.plot()

I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.

There is a similar question for R/ggplot2 here, if it helps to clarify what I want.

Masquer answered 22/10, 2014 at 12:25 Comment(1)
There is a possibly nicer solution for this here which uses only Pandas, its .boxplot() and .pivot() functions and does not require SeabornSlunk
A
17

If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.

import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))


fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)

Which gives: enter image description here

Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:

ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))

Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :

[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]
Anastrophe answered 22/10, 2014 at 13:46 Comment(2)
I actually do use seaborn anyway, so that's definitely an option. Thanks! /edit: Is there any way to use this for arbitrary times, e.g. 3-hour boxplots, 7 day boxplots, etc.?Masquer
Yes, you can pas anything to Seaborns grouper. The challenge is to define the groups from the index of the Series. I have added an example for 3-hourly periods. Something like that could work for arbitrary periods of time. Its not very readable unfortunately, perhaps some simplification is possible by using Pandas Timegrouper. Opening a specific question on Seaborns grouper might get you some help from people who use Seaborn a lot, which i'm not.Anastrophe
T
11

(Not enough rep to comment on accepted solution, so adding an answer instead.)

The accepted code has two small errors: (1) need to add numpy import and (2) nned to swap the x and y parameters in the boxplot statement. The following produces the plot shown.

import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt

n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))

fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
Tentation answered 4/6, 2016 at 8:35 Comment(1)
You're the best!Amersham
L
2

It only uses native pandas and allows for hierarchical date-time grouping (i.e spanning years). The key is that if you pass a function to groupby(), it will be called on each element of the dataframe's index. If your index is a DatetimeIndex (or similar), you can access all of the dt's convenience functions for resampling!

n = 480
ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, figsize=(12,9), rot=90)

enter image description here

Lisabeth answered 28/4, 2020 at 23:30 Comment(1)
add plt.show() in code above to get the figure. also import matplotlib.pyplot as plt.Mancunian

© 2022 - 2024 — McMap. All rights reserved.