Correct syntax for mutate_if
Asked Answered
O

5

55

I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below:

set.seed(1)
mtcars[sample(1:dim(mtcars)[1], 5),
       sample(1:dim(mtcars)[2], 5)] <-  NA

require(dplyr)

mtcars %>% 
    mutate_if(is.na,0)

mtcars %>% 
    mutate_if(is.na, funs(. = 0))

Returns error:

Error in vapply(tbl, p, logical(1), ...) : values must be length 1, but FUN(X[[1]]) result is length 32

What's the correct syntax for this operation?

Obsolescent answered 5/2, 2017 at 12:31 Comment(1)
for this particular task, you might also consider the simpler tidyr::replace_na rather than the more generic mutate_if approachesTransient
I
54

I learned this trick from the purrr tutorial, and it also works in dplyr. There are two ways to solve this problem:
First, define custom functions outside the pipe, and use it in mutate_if():

any_column_NA <- function(x){
    any(is.na(x))
}
replace_NA_0 <- function(x){
    if_else(is.na(x),0,x)
}
mtcars %>% mutate_if(any_column_NA,replace_NA_0)

Second, use the combination of ~,. or .x.( .x can be replaced with ., but not any other character or symbol):

mtcars %>% mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x))
#This also works
mtcars %>% mutate_if(~ any(is.na(.)),~ if_else(is.na(.),0,.))

In your case, you can also use mutate_all():

mtcars %>% mutate_all(~ if_else(is.na(.x),0,.x))

Using ~, we can define an anonymous function, while .x or . stands for the variable. In mutate_if() case, . or .x is each column.

Iceskate answered 4/2, 2018 at 4:47 Comment(1)
Purrr Tutorial has moved to rstudio.com/resources/rstudioconf-2017/…Chon
D
53

The "if" in mutate_if refers to choosing columns, not rows. Eg mutate_if(data, is.numeric, ...) means to carry out a transformation on all numeric columns in your dataset.

If you want to replace all NAs with zeros in numeric columns:

data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))
Dionysius answered 5/2, 2017 at 12:39 Comment(2)
working fine, one might use if_else instead to stay in the tidyverse and benefit for the additionnal check of the TRUE, FALSE type coherenceSelfconfessed
if you want to check if it's NA or equal to "NA" in the ifelse how you caan solve this (add another condition)Bagpipes
M
27
mtcars %>% mutate_if(is.numeric, replace_na, 0)

or more recent syntax

mtcars %>% mutate(across(where(is.numeric),
                         replace_na, 0))
Monocarpic answered 17/6, 2018 at 17:37 Comment(2)
Simplicity is important. If a simple line of code can do the same thing as more complex, long code, I think it should be chosen instead.Caliche
This should be in the help page for mutate_if. Thanks for making my life easier.Dilley
B
4

We can use set from data.table

library(data.table)
setDT(mtcars)
for(j in seq_along(mtcars)){
  set(mtcars, i= which(is.na(mtcars[[j]])), j = j, value = 0)
 }
Barrow answered 5/2, 2017 at 14:34 Comment(2)
How might this be modified to only operate on numeric variables please?Thrift
@RickPack. You could change the for(j in seq_along(mtcars)) to nm1 <- names(mtcars)[mtcars[, unlist(lapply(.SD, is.numeric))]; for(j in nm1)Barrow
C
3

I always struggle with replace_na function of dplyr

  replace(is.na(.),0)

this works for me for what you are trying to do.

Copestone answered 23/10, 2018 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.