Splitting the difference in R
Asked Answered
S

3

5

I have a dataset in R that looks like this:

data = structure(list(quarter = c("Q1 2005", "Q2 2005", "Q3 2005", "Q4 2005", 
"Q1 2006"), value = c(128.76, 178.83, 140.9, 188.3, 194.05)), class = "data.frame", row.names = c(NA, 
-5L))

I want to expand this dataset to have an extra column called "month" (e.g. jan, feb, mar....) ... and I want to split the difference across all months between successive quarters such that the numbers still add up between quarters.

I tried to do it like this:

library(dplyr)

expand_dataset <- function(data) {
quarter_to_months <- list(
    "Q1" = c("Jan", "Feb", "Mar"),
    "Q2" = c("Apr", "May", "Jun"),
    "Q3" = c("Jul", "Aug", "Sep"),
    "Q4" = c("Oct", "Nov", "Dec")
    )

expanded_data <- data.frame()

for (i in 1:(nrow(data) - 1)) {
    current_value <- data$value[i]
    next_value <- data$value[i + 1]
    
    diff <- (next_value - current_value) / 3
    
    quarter <- substr(data$quarter[i], 1, 2)
    year <- substr(data$quarter[i], 4, 7)
    months <- quarter_to_months[[quarter]]
    
    for (j in 1:3) {
        month_value <- current_value + (j - 1) * diff
        expanded_data <- rbind(expanded_data, data.frame(
            quarter = data$quarter[i],
            month = months[j],
            year = year,
            value = month_value
        ))
    }
    }

return(expanded_data)
}

expanded_data <- expand_dataset(data)
print(expanded_data)

Is this the correct way to do this? Is there an easier way to do this?

Stalemate answered 3/10 at 21:46 Comment(0)
S
4

If you meant where the value you split for the months in total equals the value you've assigned in the original dataset, your code doesn't create that solution.

There are many, many, many ways you could accomplish this.

This answer assumes you wanted a value, that in total, equates to the quarter's value. I used 1/3 of the quarter value and assigned it to each month.

This solution uses dplyr and tidyr.

d2 <- data %>% rowwise %>%    # use rowwise so that vals only uses 1 row of data
  mutate(months = case_when(
    str_detect(quarter, 'Q1') ~ list(month.abb[1:3]), # find strings like & assoc months
    str_detect(quarter, 'Q2') ~ list(month.abb[4:6]), 
    str_detect(quarter, 'Q3') ~ list(month.abb[7:9]), 
    TRUE ~ list(month.abb[10:12])),                 # if no other conditions met
    vals = list(rep(1/3 * value, 3))) %>%           # divide quarter vals amongst quarters' months
  unnest_longer(c(months, vals))      # get rid of the nested lists
# # A tibble: 15 × 4
#    quarter value months  vals
#    <chr>   <dbl> <chr>  <dbl>
#  1 Q1 2005  129. Jan     42.9
#  2 Q1 2005  129. Feb     42.9
#  3 Q1 2005  129. Mar     42.9
#  4 Q2 2005  179. Apr     59.6
#  5 Q2 2005  179. May     59.6
#  6 Q2 2005  179. Jun     59.6
#  7 Q3 2005  141. Jul     47.0
#  8 Q3 2005  141. Aug     47.0
#  9 Q3 2005  141. Sep     47.0
# 10 Q4 2005  188. Oct     62.8
# 11 Q4 2005  188. Nov     62.8
# 12 Q4 2005  188. Dec     62.8
# 13 Q1 2006  194. Jan     64.7
# 14 Q1 2006  194. Feb     64.7
# 15 Q1 2006  194. Mar     64.7
Salliesallow answered 3/10 at 22:24 Comment(3)
hello! thank you so much!Stalemate
Seems like the months are just Q4 repeated…Akron
Wow, can't believe I missed that... thanks @AndreWildberg It's all fixed now.Salliesallow
P
4

This is quicker and much easier in base R:

n <- nrow(data)
m <- month.abb[sequence(rep(3, n), as.numeric(substr(data$quarter,2,2))*3-2)]
transform(data[gl(n, 3),],month = m, value2 = value/3, row.names = NULL)

   quarter  value month   value2
1  Q1 2005 128.76   Jan 42.92000
2  Q1 2005 128.76   Feb 42.92000
3  Q1 2005 128.76   Mar 42.92000
4  Q2 2005 178.83   Apr 59.61000
5  Q2 2005 178.83   May 59.61000
6  Q2 2005 178.83   Jun 59.61000
7  Q3 2005 140.90   Jul 46.96667
8  Q3 2005 140.90   Aug 46.96667
9  Q3 2005 140.90   Sep 46.96667
10 Q4 2005 188.30   Oct 62.76667
11 Q4 2005 188.30   Nov 62.76667
12 Q4 2005 188.30   Dec 62.76667
13 Q1 2006 194.05   Jan 64.68333
14 Q1 2006 194.05   Feb 64.68333
15 Q1 2006 194.05   Mar 64.68333

If tidyverse:

library(tidyverse)

data %>%
  mutate(month = map(parse_number(quarter)*3-2, 
                     ~month.abb[seq(.x, len=3)]), value2=value/3) %>%
   unnest_longer(month)
# A tibble: 15 × 4
   quarter value month value2
   <chr>   <dbl> <chr>  <dbl>
 1 Q1 2005  129. Jan     42.9
 2 Q1 2005  129. Feb     42.9
 3 Q1 2005  129. Mar     42.9
 4 Q2 2005  179. Apr     59.6
 5 Q2 2005  179. May     59.6
 6 Q2 2005  179. Jun     59.6
 7 Q3 2005  141. Jul     47.0
 8 Q3 2005  141. Aug     47.0
 9 Q3 2005  141. Sep     47.0
10 Q4 2005  188. Oct     62.8
11 Q4 2005  188. Nov     62.8
12 Q4 2005  188. Dec     62.8
13 Q1 2006  194. Jan     64.7
14 Q1 2006  194. Feb     64.7
15 Q1 2006  194. Mar     64.7
Praseodymium answered 4/10 at 1:43 Comment(0)
A
2

A tidyverse approach based on the calculations made in your for-loops

library(dplyr)
library(tidyr)

data %>% 
  separate_wider_delim(quarter, names=c("quarter", "year"), " ") %>% 
  mutate(Next = lead(value)) %>%
  reframe(value, Next, 
          month = as.Date(paste("01", month.abb, year), "%d %b %Y"), 
          .by = c(quarter, year)) %>% 
  filter(quarter == quarters(month)) %>% 
  mutate(month = format.Date(month, "%b"), 
         new_value = value + ((Next - value) / 3) * (row_number() - 1),
         Next = NULL,
         .by = c(quarter, year))

output

# A tibble: 15 × 5
   quarter year  value month new_value
   <chr>   <chr> <dbl> <chr>     <dbl>
 1 Q1      2005   129. Jan        129.
 2 Q1      2005   129. Feb        145.
 3 Q1      2005   129. Mar        162.
 4 Q2      2005   179. Apr        179.
 5 Q2      2005   179. May        166.
 6 Q2      2005   179. Jun        154.
 7 Q3      2005   141. Jul        141.
 8 Q3      2005   141. Aug        157.
 9 Q3      2005   141. Sep        172.
10 Q4      2005   188. Oct        188.
11 Q4      2005   188. Nov        190.
12 Q4      2005   188. Dec        192.
13 Q1      2006   194. Jan         NA 
14 Q1      2006   194. Feb         NA 
15 Q1      2006   194. Mar         NA

If you just want the split values per quarter

data %>% 
  separate_wider_delim(quarter, names=c("quarter", "year"), " ") %>% 
  reframe(value = value / 3, 
          month = as.Date(paste("01", month.abb, year), "%d %b %Y"), 
          .by = c(quarter, year)) %>% 
  filter(quarter == quarters(month)) %>% 
  mutate(month = format.Date(month, "%b"))
# A tibble: 15 × 4
   quarter year  value month
   <chr>   <chr> <dbl> <chr>
 1 Q1      2005   42.9 Jan  
 2 Q1      2005   42.9 Feb  
 3 Q1      2005   42.9 Mar  
 4 Q2      2005   59.6 Apr  
 5 Q2      2005   59.6 May  
 6 Q2      2005   59.6 Jun  
 7 Q3      2005   47.0 Jul  
 8 Q3      2005   47.0 Aug  
 9 Q3      2005   47.0 Sep  
10 Q4      2005   62.8 Oct  
11 Q4      2005   62.8 Nov  
12 Q4      2005   62.8 Dec  
13 Q1      2006   64.7 Jan  
14 Q1      2006   64.7 Feb  
15 Q1      2006   64.7 Mar
Akron answered 3/10 at 23:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.