How can I keep pivot_wider() from dropping factor levels in names?

Asked 19/11, 2019 at 15:55 Answered 29/8, 2022 at 9:49

I would really like pivot_wider to create a column with NAs if the level of a factor exists but never appears in the data when it's used as a names_from argument. For example, the first line gives me a two column tibble, but I'd really like the three column tibble below.

tibble(Person=c("Sarah", "Jackson", "Jackson"), Rank=c(1,1,2), 
       FavoriteAnimal=factor(c("Dog", "Dog", "Cat")))%>%
    group_by(Person)%>%arrange(Rank)%>%slice(1)%>%
    pivot_wider(names_from = FavoriteAnimal, values_from=Rank)

tibble(Person=c("Jackson", "Sarah"), Dog=c(1,1), Cat=c(NA,NA))

How can I get my column of NAs for levels not appearing in my dataset?

Cockloft answered 19/11, 2019 at 15:55 Comment(1)

Try using parameter names_expand = TRUE which has been made available with tidyr 1.2.0 – Gorse 15/7, 2022 at 13:6

You can use names_expand = TRUE in pivot_wider to include all factor levels in the pivot:

tib %>%
  pivot_wider(names_from = FavoriteAnimal, values_from = Rank, 
              names_expand = TRUE)

  Person    Cat   Dog
  <chr>   <dbl> <dbl>
1 Jackson    NA     1
2 Sarah      NA     1

data

tib <- tibble(Person=c("Sarah", "Jackson", "Jackson"), Rank=c(1,1,2), 
       FavoriteAnimal=factor(c("Dog", "Dog", "Cat")))%>%
  group_by(Person)%>%arrange(Rank)%>%slice(1)

Sailboat answered 29/8, 2022 at 9:49 Comment(0)

Alternatively, you can first add the missing levels and then do the transformation:

tibble(Person = c("Sarah", "Jackson", "Jackson"), 
       Rank = c(1, 1, 2), 
       FavoriteAnimal = factor(c("Dog", "Dog", "Cat"))) %>%
 group_by(Person) %>%
 arrange(Rank) %>% 
 slice(1) %>%
 complete(FavoriteAnimal = FavoriteAnimal) %>%
 pivot_wider(names_from = FavoriteAnimal, values_from = Rank)

  Person    Cat   Dog
  <chr>   <dbl> <dbl>
1 Jackson    NA     1
2 Sarah      NA     1

Misconstruction answered 19/11, 2019 at 16:24 Comment(1)

Did not know about the complete function, this is great! Just a note--this only works as desired because we're still working on a grouped tibble! – Cockloft 19/11, 2019 at 16:45

You can do it with tidyr::spread - spread(key = FavoriteAnimal, value = Rank, drop = FALSE) gives you what you want.

Unfortunately the drop argument seems to have been lost in the transition from spread to pivot_wider.

Sheetfed answered 19/11, 2019 at 16:12 Comment(2)

Looks like there's a GitHub issue for this – Floreated 19/11, 2019 at 16:26

FYI - tidy::spread is superseded by pivot_wider in tidyr version 1.1.4 – Photocopier 13/1, 2022 at 14:49

It seems that your slice(1) operation is removing Jackson's Cat ranking. If you remove it from your operation, and add the pivot_wider parameter values_fill = NA you get a 3 column tbl. The only difference between my answer and your goal is that my answer retains Jackson's Cat ranking value.

tibble(Person=c("Sarah", "Jackson", "Jackson"), Rank=c(1,1,2), 
       FavoriteAnimal=factor(c("Dog", "Dog", "Cat")))%>%
    group_by(Person)%>%arrange(Rank)%>%
    pivot_wider(names_from = FavoriteAnimal, values_from=Rank, values_fill = NA)

Based on the help documentation for dplyr::slice, it appears that you are trying to select only the top ranked animal for each person, so this solution doesn't do that. But depending on where you are going with this, there may be other ways such as dplyr::select or dplyr::filter perhaps combined with dplyr::across to accomplish this.

Photocopier answered 13/1, 2022 at 15:12 Comment(0)

Recommended topics

Hot tags