How to substitute NA by 0 in 20 columns?
Asked Answered
A

6

21

I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution?

mydata[,c("a", "c")] <-
        apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)})

UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d

a  b  c  d  e  f  g  d
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5

The result must be this one:

a  b  c  d  e  f  g  d
1  0  0  2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  0  t  5  5
Aragonite answered 11/10, 2015 at 16:46 Comment(4)
post some data to test.Riggins
Just do cols <- c("b", "c", "e", "f"); mydf[cols] <- replace(mydf[cols], is.na(mydf[cols]), 0).Wrinkle
Do you really have two d columns?Friulian
If the columns are consecutive use start:end instead of c()Milurd
B
4

We can use NAer from qdap to convert the NA to 0. If there are multiple column, loop using lapply.

library(qdap)
nm1 <- c('b', 'c', 'e', 'f')
mydata[nm1] <- lapply(mydata[nm1], NAer)
mydata
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Or using dplyr

library(dplyr)
mydata %>% 
   mutate_each_(funs(replace(., which(is.na(.)), 0)), nm1)
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5
Boneyard answered 11/10, 2015 at 16:49 Comment(1)
But where do I define the names of columns, in which NAs should be substituted by 0? I do not need to substitute NAs by 0 in all columns.Aragonite
E
19

The replace_na function from tidyr can be applied over a vector as well as a dataframe (http://tidyr.tidyverse.org/reference/replace_na.html).

Use it with a mutate_at variation from dplyr to apply it to multiple columns at the same time:

my_data %>% mutate_at(vars(b,c,e,f), replace_na, 0)

or

my_data %>% mutate_at(c('b','c','e','f'), replace_na, 0)
Elianore answered 9/5, 2018 at 13:13 Comment(2)
No idea why, but only the vars() version worked for meNanine
Thank you for this! I've been trying to come up with a concise way of doing this and didn't think of using vars().Irrelevant
D
9

Here is a tidyverse way to replace NA with different values based on the data type of the column.

library(tidyverse)

dataset %>% mutate_if(is.numeric, replace_na, 0) %>%  
    mutate_if(is.character, replace_na, "")
Dessert answered 13/5, 2020 at 2:21 Comment(0)
B
7

Another strategy using tidyr::replace_na()

library(tidyverse)

df <- read.table(header = T, text = 'a  b  c  d  e  f  g  h
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5')

df %>%
  mutate(across(everything(), ~replace_na(., 0)))
#>   a b c d e f g h
#> 1 1 0 0 2 3 4 7 6
#> 2 2 g 3 0 4 5 4 Y
#> 3 3 r 4 4 0 t 5 5

Created on 2021-08-22 by the reprex package (v2.0.0)

Britannia answered 22/8, 2021 at 4:14 Comment(1)
You do not show how to replace the NA's of specified columns , as was actually asked.Nobukonoby
Q
5

Another option:

library(tidyr)
v <- c('b', 'c', 'e', 'f')
replace_na(df, as.list(setNames(rep(0, length(v)), v)))

Which gives:

#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5
Quorum answered 11/10, 2015 at 17:26 Comment(0)
B
4

We can use NAer from qdap to convert the NA to 0. If there are multiple column, loop using lapply.

library(qdap)
nm1 <- c('b', 'c', 'e', 'f')
mydata[nm1] <- lapply(mydata[nm1], NAer)
mydata
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5

Or using dplyr

library(dplyr)
mydata %>% 
   mutate_each_(funs(replace(., which(is.na(.)), 0)), nm1)
#  a b c  d e f g d.1
#1 1 0 0  2 3 4 7   6
#2 2 g 3 NA 4 5 4   Y
#3 3 r 4  4 0 t 5   5
Boneyard answered 11/10, 2015 at 16:49 Comment(1)
But where do I define the names of columns, in which NAs should be substituted by 0? I do not need to substitute NAs by 0 in all columns.Aragonite
C
3

Knowing that replace_na() accepts a named list for the replace argument, using purrr::map() is a good option here to reduce the amount of code. It is also possible to replace different values in each column using 'map2()'.

code:

library(data.table)
library(tidyverse)

tbl <-read_table("a  b  c  d  e  f  g  d
1  NA NA 2  3  4  7  6
2  g  3  NA 4  5  4  Y
3  r  4  4  NA t  5  5")
#> Warning: Duplicated column names deduplicated: 'd' => 'd_1' [8]
nms <- c('b', 'c', 'e', 'f', 'g')

imap_dfc(tbl, ~ if(any(.y == nms)) replace_na(.x, 0) else .x) 
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3    NA     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5
#using data.table

tblDT <- as.data.table(tbl)

#Further explanation here: https://stackoverflow.com/questions/16846380
tblDT[, (nms) := map(.SD, ~replace_na(., 0)), .SDcols = nms]

tblDT %>% 
  as_tibble()
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3    NA     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5
#to replace na's in every column:

tbl %>%
  replace_na(map(., ~0))
#> # A tibble: 3 × 8
#>       a b         c     d     e f         g d_1  
#>   <dbl> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
#> 1     1 0         0     2     3 4         7 6    
#> 2     2 g         3     0     4 5         4 Y    
#> 3     3 r         4     4     0 t         5 5

Created on 2021-09-25 by the reprex package (v2.0.1)

Chordophone answered 11/9, 2021 at 21:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.