What's the best way to replace missing values with NA when reading in a .csv?

Asked 11/12, 2012 at 15:17 Answered 3/7, 2019 at 9:47

I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:

import = read.csv("/Users/dataset.csv", 
                  header =T, na.strings=c(""))

This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import), some missing cells are filled with <NA> and some missing cells are filled with NA. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.

Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.

Estremadura answered 11/12, 2012 at 15:17 Comment(2)

The <NA> vs NA just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that. It will be hard to diagnose the other problem without access to your csv (or some representative portion of it). – Provenance 11/12, 2012 at 15:22

I think you can just use na.strings=c("",".","NA") or something like that (although I agree with @Joran that a small reproducible example [ tinyurl.com/reproducible-000 ] would be nice – Transparency 11/12, 2012 at 15:23

The <NA> vs NA just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that.

As Ben mentioned above, if some of your missing values in the csv are represented by a single period, ., then you can specify a vector of values that should be treated as NAs via:

na.strings=c("",".","NA")

as an argument to read.csv.

Provenance answered 7/7, 2013 at 1:59 Comment(0)

You can also use the more flexible readr package, whose equivalent function and argument are read_csv() and na.

library(readr)
read_csv("file.csv", na = c(".", ".."))

Ser answered 3/7, 2019 at 9:47 Comment(0)

Recommended topics

Hot tags