Replace missing values (NA) with blank (empty string)
Asked Answered
J

3

52

I have a dataframe with an NA row:

 df = data.frame(c("classA", NA, "classB"), t(data.frame(rep("A", 5), rep(NA, 5), rep("B", 5))))
 rownames(df) <- c(1,2,3)
 colnames(df) <- c("class", paste("Year", 1:5, sep = ""))

 > df
   class Year1 Year2 Year3 Year4 Year5
1 classA     A     A     A     A     A
2   <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
3 classB     B     B     B     B     B

I introduced the empty row (NA row) on purpose because I wanted to have some space between classA row and classB row.

Now, I would like to substitute the <NA> by blank, so that the second row looks like an empty row.

I tried:

 df[is.na(df)] <- ""

and

 df[df == "NA"] <- ""

but it didn't work..

Any ideas? Thanks!

Jacquelyn answered 25/10, 2013 at 14:33 Comment(13)
Your first attempt works just fine for me. What about it didn't work?Willenewillet
I still see <NA> in the dataframe, the code doesn't seem to affect anythingJacquelyn
It to do with factors (of course!)... try str(df) (I jumped the gun on my answer!)Therein
Gah. I basically have to forget I'm running stringsAsFactors = FALSE once every morning on SO. Listen to Simon.Willenewillet
@SimonO101 Your answer is right on! Factors, I always forget about those.. Thanks!Jacquelyn
By the way, never just say "it didn't work". You neglected to mention the six (!) warning messages you surely received upon running that code. The warning message should have been awfully suggestive, don't you think?Willenewillet
@Jilber is right really. I typed up an embarssingly wrong answer! Lucky SO doesn't keep the edit history I deleted it so quick! (Hopefully) :-)Therein
@Willenewillet I didn't receive a single error message...Jacquelyn
The brackets around the <NA> indicate that they are not strings. Have a look HERE for more info.Certify
@RicardoSaporta I should really remember it is that way round considering I upvoted that answer before.Therein
@RicardoSaporta Thanks! Nice tip to remember!Jacquelyn
I said warning, not error. They are different. And R 3.0.1 most definitely throws 6 warning messages upon running your code.Willenewillet
Weird.. I didn't receive any warnings, not errors.Jacquelyn
R
54

Another alternative:

df <- sapply(df, as.character) # since your values are `factor`
df[is.na(df)] <- 0

If you want blanks instead of zeroes

> df <- sapply(df, as.character)
> df[is.na(df)] <- " "
> df
     class    Year1 Year2 Year3 Year4 Year5
[1,] "classA" "A"   "A"   "A"   "A"   "A"  
[2,] " "      " "   " "   " "   " "   " "  
[3,] "classB" "B"   "B"   "B"   "B"   "B"  

If you want a data.frame, then just use as.data.drame

> as.data.frame(df)
   class Year1 Year2 Year3 Year4 Year5
1 classA     A     A     A     A     A
2                                     
3 classB     B     B     B     B     B
Roede answered 25/10, 2013 at 14:38 Comment(2)
I thought " " is space and "" is blank. Am i right?Evulsion
Carefull if you are replacing NAs with blanks (""). the conversion back to data.frame will introduce NAs again. I found that the safest is to replace NAs directly without converting the data frame to a character matrix.Tauro
C
14

This answer is more of an extended comment.

What you're trying to do isn't what I would consider good practice. R is not, say, Excel, so doing something like this just to create visual separation in your data is just going to give you a headache later on down the line.

If you really only cared about the visual output, I can offer two suggestions:

  1. Use the na.print argument to print when you want to view the data with that visual separation.

    print(df, na.print = "")
    #    class Year1 Year2 Year3 Year4 Year5
    # 1 classA     A     A     A     A     A
    # 2                                     
    # 3 classB     B     B     B     B     B
    
  2. Realize that even the above is not the best suggestion. Get both visual and content separation by converting your data.frame to a list:

    split(df, df$class)
    # $classA
    #    class Year1 Year2 Year3 Year4 Year5
    # 1 classA     A     A     A     A     A
    # 
    # $classB
    #    class Year1 Year2 Year3 Year4 Year5
    # 3 classB     B     B     B     B     B
    
Chapen answered 25/10, 2013 at 17:55 Comment(1)
for na.printto work, the dataframe columns must be character now. if they are not, convert the dataframe by dplyr::mutate(across(everything(), as.character))Transferor
R
3

Here is a dplyr option where you mutate across all the columns (everything()), where you replace in each column (.x) the NA value with an empty space like this:

library(dplyr)
df %>%
  mutate(across(everything(), ~ replace(.x, is.na(.x), "")))
#>    class Year1 Year2 Year3 Year4 Year5
#> 1 classA     A     A     A     A     A
#> 2                                     
#> 3 classB     B     B     B     B     B

Created on 2023-04-02 with reprex v2.0.2

Roundish answered 2/4, 2023 at 16:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.