replace <NA> with NA
Asked Answered
A

3

6

I have a data frame containing entries; It appears that these values are not treated as NA since is.na returns FALSE. I would like to convert these values to NA but could not find the way.

Asthenosphere answered 6/10, 2014 at 16:46 Comment(4)
I'm guessing your talking about doing this in R? Otherwise, na is pretty ambiguous... north america? not available?Quilmes
Yes sorry in R; NA stands for missing valueAsthenosphere
Provide a sample of your data by adding the output of dput(your.data.frame[some.rows.that.contain.such.values,]) to your question.Dumbstruck
The results of str(your.data.frame) would also be useful to let us see how the columns are stored.Carbolated
G
4

The two classes where this is likely to be an issue are character and factor. This should loop over a dtaframe and convert the "NA" values into true <NA>'s but just for those two classes:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x=="NA"; x} else {
                                  x}
df[] <- lapply(df, make.true.NA)

(Untested in the absence of a data example.) The use of the form: df_name[] will attempt to retain the structure of the original dataframe which would otherwise lose its class attribute. I see that ujjwal thinks your spelling of NA has flanking "<>" characters so you might try this functions as more general:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x %in% c("NA", "<NA>"); x} else {
                                  x}
Gottuard answered 6/10, 2014 at 19:3 Comment(3)
Thanks for help. The problem is that I do not manage to make a reproducible example in which I obtain both NA and <NA>. The function of BondedDust allowed me to transform both NA and <NA> in true NA (they appear all TRUE with is.na(df)), but the structure of my df shows that the variables that contain <NA> entries are coded as factor and not as numeric.Asthenosphere
I suspect you would not want to make a conversion of all character vectors to numeric so you might want to apply this conversion just to particular columns: dfrm[targets] <- lapply( dfrm[targets], make.true.NA) ; dfrm[targets] <- lapply( dfrm[targets], as.numeric)Gottuard
Yes, I have to convert to numeric, but it works only if I unlist my dataframe first. I have no idea why it appears as list, but at least it is ok.Asthenosphere
B
5

Use dfr[dfr=="<NA>"]=NA where dfr is your dataframe.

For example:

> dfr<-data.frame(A=c(1,2,"<NA>",3),B=c("a","b","c","d"))

> dfr
     A  B
1    1  a
2    2  b
3 <NA>  c
4    3  d

> is.na(dfr)
         A     B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE

> dfr[dfr=="<NA>"] = NA                 **key step**

> is.na(dfr)
         A     B
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,]  TRUE FALSE
[4,] FALSE FALSE
Bartel answered 6/10, 2014 at 19:25 Comment(0)
G
4

The two classes where this is likely to be an issue are character and factor. This should loop over a dtaframe and convert the "NA" values into true <NA>'s but just for those two classes:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x=="NA"; x} else {
                                  x}
df[] <- lapply(df, make.true.NA)

(Untested in the absence of a data example.) The use of the form: df_name[] will attempt to retain the structure of the original dataframe which would otherwise lose its class attribute. I see that ujjwal thinks your spelling of NA has flanking "<>" characters so you might try this functions as more general:

make.true.NA <- function(x) if(is.character(x)||is.factor(x)){
                                  is.na(x) <- x %in% c("NA", "<NA>"); x} else {
                                  x}
Gottuard answered 6/10, 2014 at 19:3 Comment(3)
Thanks for help. The problem is that I do not manage to make a reproducible example in which I obtain both NA and <NA>. The function of BondedDust allowed me to transform both NA and <NA> in true NA (they appear all TRUE with is.na(df)), but the structure of my df shows that the variables that contain <NA> entries are coded as factor and not as numeric.Asthenosphere
I suspect you would not want to make a conversion of all character vectors to numeric so you might want to apply this conversion just to particular columns: dfrm[targets] <- lapply( dfrm[targets], make.true.NA) ; dfrm[targets] <- lapply( dfrm[targets], as.numeric)Gottuard
Yes, I have to convert to numeric, but it works only if I unlist my dataframe first. I have no idea why it appears as list, but at least it is ok.Asthenosphere
F
1

You can do this with the naniar package as well, using replace_with_na and associated functions.


dfr <- data.frame(A = c(1, 2, "<NA>", 3), B = c("a", "b", "c", "d"))

library(naniar)
# dev version - devtools::install_github('njtierney/naniar')
is.na(dfr)
#>          A     B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE FALSE

dfr %>% replace_with_na(replace = list(A = "<NA>")) %>% is.na()
#>          A     B
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,]  TRUE FALSE
#> [4,] FALSE FALSE

# You can also specify how to do this for many variables

dfr %>% replace_with_na_all(~.x == "<NA>")
#> # A tibble: 4 x 2
#>       A     B
#>   <int> <int>
#> 1     2     1
#> 2     3     2
#> 3    NA     3
#> 4     4     4

You can read more about using replace_with_na here

Frimaire answered 19/1, 2018 at 3:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.