Add row in each group using dplyr and add_row()
Asked Answered
I

4

38

If I add a new row to the iris dataset with:

iris <- as_tibble(iris)

> iris %>% 
    add_row(.before=0)

# A tibble: 151 × 5
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>   <chr>
1            NA          NA           NA          NA    <NA> <--- Good!
2           5.1         3.5          1.4         0.2  setosa
3           4.9         3.0          1.4         0.2  setosa

It works. So, why can't I add a new row on top of each "subset" with:

iris %>% 
 group_by(Species) %>% 
 add_row(.before=0)

Error: is.data.frame(df) is not TRUE
Imco answered 13/4, 2017 at 23:41 Comment(3)
Upgrade your version of tibble, that error message is at least three months old. (The new error message says "Cannot add rows to grouped data frames", which answers your question of why it is not working.)Bravery
You can use do to add row to each group: iris %>% group_by(Species) %>% do(add_row(., .before=0)).Lavish
Thanks JasonWang and r2evans. I've updated my packages and using do() does the trick.Imco
W
36

A more recent version would be using group_modify() instead of do().

iris %>%
  as_tibble() %>%
  group_by(Species) %>% 
  group_modify(~ add_row(.x,.before=0))
#> # A tibble: 153 x 5
#> # Groups:   Species [3]
#>    Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#>    <fct>          <dbl>       <dbl>        <dbl>       <dbl>
#>  1 setosa          NA          NA           NA          NA  
#>  2 setosa           5.1         3.5          1.4         0.2
#>  3 setosa           4.9         3            1.4         0.2
Wart answered 13/7, 2021 at 15:39 Comment(4)
This should be used now instead of the do call proposed by @Lavish in the OP's comments. group_modify preserves the group name when creating the new row whereas do does not, giving the user a value of NA for the grouped variable.Helmuth
Just adding a comment years after I posted the question: group_modify is still in experimental phase as of May-2022. Thanks for the answer AlexlokImco
@Wart , I want the Species in NA rows paste '_blank' : group_modify(~ add_row(.x %>% mutate(Species=paste0(Species,'_blank')), .before=0)) .But it's failed, how can i fix it ? Thanks!Cytoplast
@Cytoplast Run the code as above without modification, then add this mutate line: mutate(Species = if_else(is.na(Sepal.Length), paste0(Species,'_blank'), Species))Wart
A
20

If you want to use a grouped operation, you need do like JasonWang described in his comment, as other functions like mutate or summarise expect a result with the same number of rows as the grouped data frame (in your case, 50) or with one row (e.g. when summarising).

As you probably know, in general do can be slow and should be a last resort if you cannot achieve your result in another way. Your task is quite simple because it only involves adding extra rows in your data frame, which can be done by simple indexing, e.g. look at the output of iris[NA, ].

What you want is essentially to create a vector

indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)

(since the first group is in rows 1 to 50, the second one in 51 to 100 and the third one in 101 to 150).

The result is then iris[indices, ].

A more general way of building this vector uses group_indices.

indices <- seq(nrow(iris)) %>% 
    split(group_indices(iris, Species)) %>% 
    map(~c(NA, .x)) %>%
    unlist

(map comes from purrr which I assume you have loaded as you have tagged this with tidyverse).

Angellaangelle answered 14/4, 2017 at 14:19 Comment(1)
Wow. Thanks for the thorough answer @konvas. FYI, no I didn't know do is slow and was not aware of the alternative with purrr/map. This is what makes SO great. Now I know where to look for answers to this problem. ThanksImco
M
6

With a slight variation, this could also be done:

library(purrr)
library(tibble)

iris %>%
  group_split(Species) %>%
  map_dfr(~ .x %>%
            add_row(.before = 1))

# A tibble: 153 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1         NA          NA           NA          NA   NA     
 2          5.1         3.5          1.4         0.2 setosa 
 3          4.9         3            1.4         0.2 setosa 
 4          4.7         3.2          1.3         0.2 setosa 
 5          4.6         3.1          1.5         0.2 setosa 
 6          5           3.6          1.4         0.2 setosa 
 7          5.4         3.9          1.7         0.4 setosa 
 8          4.6         3.4          1.4         0.3 setosa 
 9          5           3.4          1.5         0.2 setosa 
10          4.4         2.9          1.4         0.2 setosa 
# ... with 143 more rows

This also can be used for grouped data frame, however, it's a bit verbose:

library(dplyr)

iris %>%
  group_by(Species) %>%
  summarise(Sepal.Length = c(NA, Sepal.Length), 
            Sepal.Width = c(NA, Sepal.Width), 
            Petal.Length = c(NA, Petal.Length),
            Petal.Width = c(NA, Petal.Width), 
            Species = c(NA, Species))
Musicology answered 13/7, 2021 at 16:23 Comment(2)
interesting approach with summarise. Wonder if it keep the groups namesImco
summarise() approach worked perfectly for the work i was trying to do. Basically replace those NA with first() or some other value c(first(Sepal.Length), Sepal.Length)Anklet
L
0

This is how to do it using reframe() in newer versions of R.
Replace 0 with any value you need.

iris %>%
  reframe(
    Sepal.Length = c(Sepal.Length, 0),
    Sepal.Width = c(Sepal.Width, 0), 
    Petal.Length = c(Petal.Length, 0),
    Petal.Width = c(Petal.Width, 0),
    .by = c(Species) ## reframe ungroups automatically, therefore safer to write it here to be reproducible line
  )

I needed this to solve this warning:

Warning message: Returning more (or less) than 1 row per summarise() group was deprecated in dplyr 1.1.0.

Lepper answered 15/7, 2024 at 13:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.