R: Summarize rows per month
Asked Answered
G

3

5

I have made a dataframe which has a column with dates and columns with numeric values. I want this dataframe to group itself by month and summerize all the numeric values from the other columns per corresponding month.

Here is my dataframe example:

capture.date  Test1  Test2  Test3
2016-03-18      0      1      1
2016-03-18      1      1      1
2016-03-20      2      1      1
2016-04-12      1      0      1

I already tried some code:

df %>% 
  group_by(capture.date) %>% 
  summarise_each(funs(sum))

and:

aggregate(df[2:4], by=df["capture.date"], sum)

but both of these options return dataframes which summarize by daily date instead of month. How can I make it summarize by month instead of by day?

desired output:

capture.date  Test1  Test2  Test3
2016-03         3      3      3     
2016-04         1      0      1
Gorgerin answered 3/11, 2019 at 12:49 Comment(0)
B
4

You can extract dates into the %Y-%m format in group_by() and use summarise_if() or summarise_at() to select which variables get be summed.

(Confirm that capture.date is Date class)

df %>%
  group_by(Date = strftime(capture.date, "%Y-%m")) %>%
  summarise_if(is.numeric, sum)

# # A tibble: 2 x 4
#   Date    Test1 Test2 Test3
#   <chr>   <int> <int> <int>
# 1 2016-03     3     3     3
# 2 2016-04     1     0     1

Update!

Scoped verbs (⁠_if⁠, ⁠_at⁠, ⁠_all⁠) have been superseded by the use of pick() or across() in an existing verb.

df %>%
  group_by(Date = strftime(capture.date, "%Y-%m")) %>%
  summarise(across(where(is.numeric), sum))
Burress answered 3/11, 2019 at 13:19 Comment(0)
C
3

The following should work

library(lubridate)
library(tidyverse)

txt <- "capture.date  Test1  Test2  Test3
2016-03-18      0      1      1
2016-03-18      1      1      1
2016-03-20      2      1      1
2016-04-12      1      0      1"

data <- read.table(text = txt, header = TRUE)

data %>% 
  mutate(month = month(capture.date), 
         year = year(capture.date)) %>% 
  group_by(month, year) %>% 
  summarise_if(is.integer, sum) %>%
  ungroup %>%
  mutate("capture.date" = paste(year, str_pad(month, 2, side = "left", pad = "0"), sep = "-")) %>%
  select(capture.date, Test1, Test2, Test3)

This will produce

# A tibble: 2 x 4
  capture.date Test1 Test2 Test3
  <chr>        <int> <int> <int>
1 2016-03          3     3     3
2 2016-04          1     0     1

You may need to change the function in summarise_if to something else than is.integer for your real data.

Chromic answered 3/11, 2019 at 13:8 Comment(2)
My bad, I updated my desired output. Can the date output look like 2016-03, 2016-04, etc instead of a new month and year column?Gorgerin
See updated text, though Darren Tsai's solution is more concise.Chromic
A
3

1) dplyr/zoo Using the data shown reproducibly in the Note at the end convert each date to yearmon class which represents dates having no day and then summrize the numeric columns:

library(dplyr)
library(zoo)

df %>% 
  group_by(yearmon = as.yearmon(capture.date)) %>% 
  summarize_if(is.numeric, sum) %>%
  ungroup

giving this tibble:

# A tibble: 2 x 4
  yearmon   Test1 Test2 Test3
  <yearmon> <int> <int> <int>
1 Mar 2016      3     3     3
2 Apr 2016      1     0     1

2) zoo This could alternately be done in a singhle read.zoo command. fortify.zoo could be used on the result if you want a data.frame as the result:

library(zoo)
read.zoo(df, FUN = as.yearmon, aggregate = sum)

giving this zoo series:

         Test1 Test2 Test3
Mar 2016     3     3     3
Apr 2016     1     0     1

2a) zoo with magrittr pipeline This could alternately be written as this pipeline with magrittr (or dplyr) pipeline:

library(magrittr)
library(zoo)

df %>% read.zoo(FUN = as.yearmon, aggregate = sum)

or to convert to data.frame

library(magrittr)
library(zoo)

df %>% read.zoo(FUN = as.yearmon, aggregate = sum) %>% fortify.zoo

3) Base R Using only Base R extract the first 7 characters of each date and then aggregate on that:

df2 <- transform(df, year.month = substr(capture.date, 1, 7), capture.date = NULL)
aggregate(. ~ year.month, df2, sum)

giving this data.frame:

  year.month Test1 Test2 Test3
1    2016-03     3     3     3
2    2016-04     1     0     1

Note

The input in reproducible form:

Lines <- "
capture.date  Test1  Test2  Test3
2016-03-18      0      1      1
2016-03-18      1      1      1
2016-03-20      2      1      1
2016-04-12      1      0      1"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
Allard answered 3/11, 2019 at 13:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.