Specifying colClasses in the read.csv
Asked Answered
B

7

125

I am trying to specify the colClasses options in the read.csv function in R. In my data, the first column time is basically a character vector, while the rest of the columns are numeric.

data <- read.csv("test.csv", comment.char="" , 
                 colClasses=c(time="character", "numeric"), 
                 strip.white=FALSE)

In the above command, I want R to read in the time column as "character" and the rest as numeric. Although the data variable did have the correct result after the command completed, R returned the following warnings. I am wondering how I can fix these warnings?

Warning messages:
1: In read.table(file = file, header = header, sep = sep, quote = quote, : not all columns named in 'colClasses' exist
2: In tmp[i[i > 0L]] <- colClasses : number of items to replace is not a multiple of replacement length

Derek

Bothersome answered 10/5, 2010 at 18:30 Comment(0)
P
89

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))
Paryavi answered 10/5, 2010 at 18:36 Comment(3)
one can probably use the following to read the first line of the csv and determine how many columns there are. scan(csv,sep=',', what="character" , nlines=1 )Bothersome
This actually is an incorrect answer and threw me off for a little while. The correct answer is below. Not trying to be a jerk, just wanted to make sure it doesn't happen to anyone else.Immigrant
@Immigrant In my case, this is still the correct answer, when you also need to specify the classes of the other variables, and they are not automatically recognized as such by read.table.Vindictive
E
203

You can specify the colClasse for only one columns.

So in your example you should use:

data <- read.csv('test.csv', colClasses=c("time"="character"))
Extramural answered 18/11, 2011 at 16:38 Comment(2)
Not that it matters much, but I found this to work without quoting the column name.Soho
This approach is actually very useful when trying to read quoted integers as character. Thanks!Nickens
P
89

The colClasses vector must have length equal to the number of imported columns. Supposing the rest of your dataset columns are 5:

colClasses=c("character",rep("numeric",5))
Paryavi answered 10/5, 2010 at 18:36 Comment(3)
one can probably use the following to read the first line of the csv and determine how many columns there are. scan(csv,sep=',', what="character" , nlines=1 )Bothersome
This actually is an incorrect answer and threw me off for a little while. The correct answer is below. Not trying to be a jerk, just wanted to make sure it doesn't happen to anyone else.Immigrant
@Immigrant In my case, this is still the correct answer, when you also need to specify the classes of the other variables, and they are not automatically recognized as such by read.table.Vindictive
L
14

Assuming your 'time' column has at least one observation with a non-numeric character and all your other columns only have numbers, then 'read.csv's default will be to read in 'time' as a 'factor' and all the rest of the columns as 'numeric'. Therefore setting 'stringsAsFactors=F' will have the same result as setting the 'colClasses' manually i.e.,

data <- read.csv('test.csv', stringsAsFactors=F)
Larger answered 10/5, 2010 at 23:19 Comment(0)
F
12

If you want to refer to names from the header rather than column numbers, you can use something like this:

fname <- "test.csv"
headset <- read.csv(fname, header = TRUE, nrows = 10)
classes <- sapply(headset, class)
classes[names(classes) %in% c("time")] <- "character"
dataset <- read.csv(fname, header = TRUE, colClasses = classes)
Follmer answered 19/12, 2011 at 19:53 Comment(0)
D
10

I know OP asked about the utils::read.csv function, but let me provide an answer for these that come here searching how to do it using readr::read_csv from the tidyverse.

read_csv ("test.csv", col_names=FALSE, col_types = cols (.default = "c", time = "i"))

This should set the default type for all columns as character, while time would be parsed as integer.

Deer answered 14/9, 2018 at 16:41 Comment(0)
H
5

For multiple datetime columns with no header, and a lot of columns, say my datetime fields are in columns 36 and 38, and I want them read in as character fields:

data<-read.csv("test.csv", head=FALSE,   colClasses=c("V36"="character","V38"="character"))                        
Hackbut answered 10/5, 2017 at 21:50 Comment(0)
I
0

If we combine what @Hendy and @Oddysseus Ithaca contributed, we get cleaner and a more general (i.e., adaptable?) chunk of code.

    data <- read.csv("test.csv", head = F, colClasses = c(V36 = "character", V38 = "character"))                        
Incogitant answered 2/11, 2018 at 17:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.