I have a rather big dataframe with a column of POSIXct datetimes (~10yr of hourly data). I would flag all the rows in which the day falls in a Daylight saving period. For example if the Daylight shift starts on '2000-04-02 03:00:00' (DOY=93) i would like that the two previous hours of DOY=93 could be flagged. Although I am a newbie of dplyr I would use this package as much as possible and avoid for-loops as much as possible
For example:
library(lubridate)
sd = ymd('2000-01-01',tz="America/Denver")
ed = ymd('2005-12-31',tz="America/Denver")
span = data.frame(date=seq(from=sd,to=ed, by="hour"))
span$YEAR = year(span$date)
span$DOY = yday(span$date)
span$DLS = dst(span$date)
To find the different days of the year in which the daylight saving is applied I use dplyr
library(dplyr)
limits = span %.% group_by(YEAR) %.% summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]))
That gives
YEAR minDOY maxDOY
1 2000 93 303
2 2001 91 301
3 2002 97 300
4 2003 96 299
5 2004 95 305
6 2005 93 303
Now I would 'pipe' the above results in the span dataframe without using a inefficient for-loop.
SOLUTION 1
with the help of @aosmith the problem can be tackled with just two commands (and avoiding the inner_join as in 'solution 2'):
limits = span %>% group_by(YEAR) %>% mutate(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS]),CHECK=FALSE)
limits$CHECK[(limits2$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE
SOLUTION 2
With the help of @beetroot and @matthew-plourde, the problem has been solved: an inner-join between was missing:
limits = span %>% group_by(YEAR) %>% summarise(minDOY=min(DOY[DLS]),maxDOY=max(DOY[DLS])) %>% inner_join(span, by='YEAR')
Then I just added a new column (CHECK) to fill with the right values for the Daylight-savings days
limits$CHECK = FALSE
limits$CHECK[(limits$DOY >= limits$minDOY) & (limits$DOY <= limits$maxDOY) ] = TRUE
mutate
instead ofsummarise
with joining. – Mastinmutate(minDOY = min(DOY[DLS]), maxDOY = max(DOY[DLS]))
in place of where you usesummarise
in your original code? That adds columns of group specific values to the original dataset, which is what it looks like you were trying to do. – Mastin