Run a custom function on a dataframe by group

M

6

19

Custom function to loop over a group in a dataframe.

Here is some sample data:

set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))

df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])

I created a custom function to calculate the value w:

calc <- function(x) {
  data <- x
  w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
  w
  }

When I run the function on the entire data set, I get the following answer:

calc(df)
[1]1.664474

Ideally, I want to return results that are grouped by tm, e.g.:

tm     w
1    result of calc
2    result of calc
3    result of calc

So far I have tried using aggregate with my function, but I get the following error:

aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors

I feel like I have stared at this too long and there is an obvious answer.

Mediation answered 15/7, 2015 at 13:9 Comment(0)

P

14

Using dplyr

library(dplyr)
df %>% 
   group_by(tm) %>%
   do(data.frame(val=calc(.)))
#  tm      val
#1  1 1.665882
#2  2 1.504545
#3  3 1.838000

If we change the function slightly to include multiple arguments, this could also work with summarise

 calc1 <- function(d1, t1, h1, p1){
      (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
 df %>%
     group_by(tm) %>% 
     summarise(val=calc1(d, t, h, p))
 #  tm      val
 #1  1 1.665882
 #2  2 1.504545
 #3  3 1.838000

Phosphaturia answered 15/7, 2015 at 13:38 Comment(0)

A

20

You can try split:

sapply(split(df, tm), calc)

#       1        2        3 
#1.665882 1.504545 1.838000

If you want a list lapply(split(df, tm), calc).

Or with data.table:

library(data.table)

setDT(df)[,calc(.SD),tm]
#   tm       V1
#1:  1 1.665882
#2:  2 1.504545
#3:  3 1.838000

Alumnus answered 15/7, 2015 at 13:13 Comment(0)

P

14

Using dplyr

library(dplyr)
df %>% 
   group_by(tm) %>%
   do(data.frame(val=calc(.)))
#  tm      val
#1  1 1.665882
#2  2 1.504545
#3  3 1.838000

If we change the function slightly to include multiple arguments, this could also work with summarise

 calc1 <- function(d1, t1, h1, p1){
      (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
 df %>%
     group_by(tm) %>% 
     summarise(val=calc1(d, t, h, p))
 #  tm      val
 #1  1 1.665882
 #2  2 1.504545
 #3  3 1.838000

Phosphaturia answered 15/7, 2015 at 13:38 Comment(0)

H

5

Since dplyr 0.8 you can use group_map:

library(dplyr)
df %>% group_by(tm) %>% group_map(~tibble(w=calc(.)))
#> # A tibble: 3 x 2
#> # Groups:   tm [3]
#>      tm     w
#>   <dbl> <dbl>
#> 1     1  1.67
#> 2     2  1.50
#> 3     3  1.84

Hem answered 18/2, 2019 at 21:37 Comment(0)

D

4

library(plyr)
ddply(df, .(tm), calc)

Delisadelisle answered 15/7, 2015 at 13:15 Comment(4)

this is exactly what I was looking for originally, but was trying to accomplish this in dplyr. Do you know what the equivalent would be? – Mediation 15/7, 2015 at 13:27

Excellent followup question. I hadn't thought about dplyr replacing ddply (and related functions). I'm pursuing an answer to that now... – Delisadelisle 15/7, 2015 at 15:38

The closest I can come is the following: group_by(df, tm) %>% do(as.data.frame(calc(.))), but the added as.data.frame is not pretty. – Delisadelisle 15/7, 2015 at 16:4

Following up; functions to do are required to return a data.frame, not a scalar. As long as calc returns a data.frame, you are safe. – Delisadelisle 16/7, 2015 at 12:19

S

0

... and the map function solution...

library(purrr)
df %>% 
    split(.$tm) %>% 
    map_dbl(calc)
# 1        2        3 
# 1.665882 1.504545 1.838000

Sexagenarian answered 1/8, 2017 at 21:47 Comment(0)

E

0

This is a neat solution that also is fully compatible with the tidy format, here illustrated with an example using the palmerpenguins dataset and a linear regression model:

palmerpenguins::penguins |> 
  drop_na() |> 
  group_by(species) |> 
  nest() |> 
  mutate(
    test_results = map(
      .x = data,
      .f = ~ lm(body_mass_g ~ flipper_length_mm, data = .x
      )
      |> broom::tidy(conf.int = TRUE)
    )
  ) |> 
  unnest(test_results) |> 
  select(species, term, estimate, p.value, conf.low, conf.high) |> 
  filter(term != "(Intercept)") |> 
  ungroup()

Embree answered 7/11, 2023 at 17:45 Comment(0)

Recommended topics

Hot tags