How to substitute NA by 0 in 20 columns?

Asked 11/10, 2015 at 16:46 Answered 11/9, 2021 at 21:34

I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution?

mydata[,c("a", "c")] <-
        apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)})

UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d

a  b  c  d  e  f  g  d
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5

The result must be this one:

a  b  c  d  e  f  g  d
1  0  0  2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  0  t  5  5

Aragonite answered 11/10, 2015 at 16:46 Comment(4)

post some data to test. – Riggins 11/10, 2015 at 16:48

Just do cols <- c("b", "c", "e", "f"); mydf[cols] <- replace(mydf[cols], is.na(mydf[cols]), 0). – Wrinkle 11/10, 2015 at 16:55

Do you really have two d columns? – Friulian 11/10, 2015 at 16:55

If the columns are consecutive use start:end instead of c() – Milurd 11/10, 2015 at 16:57

We can use NAer from qdap to convert the NA to 0. If there are multiple column, loop using lapply.

library(qdap)
nm1 <- c('b', 'c', 'e', 'f')
mydata[nm1] <- lapply(mydata[nm1], NAer)
mydata
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Or using dplyr

library(dplyr)
mydata %>% 
   mutate_each_(funs(replace(., which(is.na(.)), 0)), nm1)
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Boneyard answered 11/10, 2015 at 16:49 Comment(1)

But where do I define the names of columns, in which NAs should be substituted by 0? I do not need to substitute NAs by 0 in all columns. – Aragonite 11/10, 2015 at 16:53

The replace_na function from tidyr can be applied over a vector as well as a dataframe (http://tidyr.tidyverse.org/reference/replace_na.html).

Use it with a mutate_at variation from dplyr to apply it to multiple columns at the same time:

my_data %>% mutate_at(vars(b,c,e,f), replace_na, 0)

my_data %>% mutate_at(c('b','c','e','f'), replace_na, 0)

Elianore answered 9/5, 2018 at 13:13 Comment(2)

No idea why, but only the vars() version worked for me – Nanine 5/5, 2020 at 13:24

Thank you for this! I've been trying to come up with a concise way of doing this and didn't think of using vars(). – Irrelevant 2/11, 2020 at 15:9

Here is a tidyverse way to replace NA with different values based on the data type of the column.

library(tidyverse)

dataset %>% mutate_if(is.numeric, replace_na, 0) %>%  
    mutate_if(is.character, replace_na, "")

Dessert answered 13/5, 2020 at 2:21 Comment(0)

Another strategy using tidyr::replace_na()

library(tidyverse)

df <- read.table(header = T, text = 'a  b  c  d  e  f  g  h
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5')

df %>%
  mutate(across(everything(), ~replace_na(., 0)))
#>   a b c d e f g h
#> 1 1 0 0 2 3 4 7 6
#> 2 2 g 3 0 4 5 4 Y
#> 3 3 r 4 4 0 t 5 5

^{Created on 2021-08-22 by the reprex package (v2.0.0)}

Britannia answered 22/8, 2021 at 4:14 Comment(1)

You do not show how to replace the NA's of specified columns , as was actually asked. – Nobukonoby 25/9, 2021 at 14:20

Another option:

library(tidyr)
v <- c('b', 'c', 'e', 'f')
replace_na(df, as.list(setNames(rep(0, length(v)), v)))

Which gives:

#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Quorum answered 11/10, 2015 at 17:26 Comment(0)

We can use NAer from qdap to convert the NA to 0. If there are multiple column, loop using lapply.

library(qdap)
nm1 <- c('b', 'c', 'e', 'f')
mydata[nm1] <- lapply(mydata[nm1], NAer)
mydata
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Or using dplyr

library(dplyr)
mydata %>% 
   mutate_each_(funs(replace(., which(is.na(.)), 0)), nm1)
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Boneyard answered 11/10, 2015 at 16:49 Comment(1)

But where do I define the names of columns, in which NAs should be substituted by 0? I do not need to substitute NAs by 0 in all columns. – Aragonite 11/10, 2015 at 16:53

Knowing that replace_na() accepts a named list for the replace argument, using purrr::map() is a good option here to reduce the amount of code. It is also possible to replace different values in each column using 'map2()'.

code:

library(data.table)
library(tidyverse)

tbl <-read_table("a  b  c  d  e  f  g  d
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5")
#> Warning: Duplicated column names deduplicated: 'd' => 'd_1' [8]
nms <- c('b', 'c', 'e', 'f', 'g')

imap_dfc(tbl, ~ if(any(.y == nms)) replace_na(.x, 0) else .x) 
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3    NA     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5
#using data.table

tblDT <- as.data.table(tbl)

#Further explanation here: https://stackoverflow.com/questions/16846380
tblDT[, (nms) := map(.SD, ~replace_na(., 0)), .SDcols = nms]

tblDT %>% 
  as_tibble()
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3    NA     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5
#to replace na's in every column:

tbl %>%
  replace_na(map(., ~0))
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3     0     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5

^{Created on 2021-09-25 by the reprex package (v2.0.1)}

Chordophone answered 11/9, 2021 at 21:34 Comment(0)

Recommended topics

Hot tags