How to split dataframe based on years in python?
Asked Answered
W

2

5

I have a dataframe called "dataframe" that contains a bunch of information of sales on a certain date. Each date entry is in the format of YYYY-MM-DD, and data ranges from 2012 to 2017. I would like to split this data frame into 6 separate dataframes, one for each year. So for example, the first split dataframe will have all the entries from 2012.

I think I was able to do this in the code below. I split the dataframe into one for each year and put them in the list "years". However, when I try to run auto_arima on each dataframe I get the error "Found input variables with inconsistent numbers of samples."

I think this is because I'm not properly splitting my original dataframe correctly. How do I properly split my dataframe based on year?

#Partition data into years
years = [g for n, g in dataframe.set_index('Date').groupby(pd.Grouper(freq='Y'))]

#Create a list that will hold all auto_arima results for every dataframe
stepwise_models = []

#Call auto_arima on every dataframe
for x in range(len(years)-1):
    currentDf = years[x]
    model = auto_arima(currentDf['price'], exogenous=xreg, start_p=1, start_q=1,
        max_p=3, max_q=3, m=12,
        start_P=0, seasonal=True,
        d=1, D=1, trace=True,
        error_action='ignore',  
        suppress_warnings=True, 
        stepwise=True)
    stepwise_models.append(model) #Store current auto_arima result in our stepwise_models[] list
Wexford answered 27/6, 2018 at 23:58 Comment(1)
Without seeing your dataframe or seeing your function definitions, we can't know for sure where your problem lines. I suggest you print years[x].head() in your loop to see it's what you expect.Creath
L
4

If you want to split a dataframe by all available years you can do this by finding the unique years in your dataframe, then loop through these unique years and then use boolean indexing for filtering out in a loop each single year.

So this idea could be implemented in a function like:

def split_years(dt):
    dt['year'] = dt['Date'].dt.year
    return [dt[dt['year'] == y] for y in dt['year'].unique()]

The result of the function above will be a list of dataframes, each with a single year.

Lalo answered 10/6, 2020 at 9:0 Comment(1)
I have been looking for something like this for days. I'm new to python and pandas and this is letting me build a couple projects independently. I think they end up more "pythonic" than the kuldgy thing I was making. Thanks!Zeralda
H
3

You can use datetime accesor to filter the rows by year and create a new dataframe by year

import datetime as dt
dataframe1=dataframe[dataframe['Date'].dt.year == 2012]
Humphries answered 28/6, 2018 at 1:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.