I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:
import = read.csv("/Users/dataset.csv",
header =T, na.strings=c(""))
This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import)
, some missing cells are filled with <NA>
and some missing cells are filled with NA
. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.
Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.
<NA>
vsNA
just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that. It will be hard to diagnose the other problem without access to your csv (or some representative portion of it). – Provenancena.strings=c("",".","NA")
or something like that (although I agree with @Joran that a small reproducible example [ tinyurl.com/reproducible-000 ] would be nice – Transparency