fread from data.table package when column names include spaces and special characters?
Asked Answered
F

3

10

I have a csv file where column names include spaces and special characters.

fread imports them with quotes - but how can I change this behaviour? One reason is that I have column names starting with a space and I don't know how to handle them.

Any pointers would be helpful.

Edit: An example.

> packageVersion("data.table")
[1] ‘1.8.8’

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE)

> head(p2p[,list(Principal remaining)])
Error: unexpected symbol in "head(p2p[,list(Principal remaining"

> head(p2p[,list("Principal remaining")])
                    V1
1: Principal remaining

> head(p2p[,list(c("Principal remaining"))])
                    V1
1: Principal remaining

What I was expecting/want is of course, what a column name without spaces yields:

> head(p2p[,list(Principal)])
   Principal
1:      1000
2:      1000
3:      1000
4:      2000
5:      1000
6:      4130
Feretory answered 6/6, 2013 at 16:8 Comment(0)
P
6

It should be rather difficult to get a leading space in a column name. Should not happen by "casual coding". On the other hand I don't see very much error checking in the fread code, so maybe until this undesirable behavior is fixed, (or the feature request refused), you can do something like this:

setnames(DT, make.names(colnames(DT))) 

If on the other hand you are bothered by the fact that colnames(DT) will display the column names with quotes then just "get over it." That's how the interactive console will display any character value.

If you have a data item in a character column that looks like " ttt" in the original, then it's going to have leading spaces when imported, and you need to process it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of the several trim functions in various packages (such as 'gdata')

Pannikin answered 6/6, 2013 at 18:2 Comment(6)
I'll try the sub/gsub route. Thanks.Feretory
You may want to update your answer with this. Warning message: In names<-.data.table(*tmp*, value = value) : The colnames(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your colnames<- calls.Feretory
I created a workaround, i.e. using this function now. clean.names <- function(x) { y <- str_trim(colnames(x)) y <- gsub("\\s", ".", y) y <- str_replace_all(y, "[[:punct:]]", ".") y <- tolower(y) return(y) }Feretory
Maybe we get rid of the stringr package and then it's ready for an answer. Would be interesting to get a comment from @matthew-dowleFeretory
You seem to be reinventing make.names in base R. If the concern is that make.names will not handle leading or trailing spaces in names to your liking, then just pass it: sub("^\\s+|\\s+$", "", colnames(m)).Pannikin
Ah good point, didn't realise. However, make.names is quite general and includes upper case and underscores etc.Feretory
M
10

A little bit modified BondedDust version, because setnames function is not used with <- sign:

setnames(DT, make.names(colnames(DT))
Michal answered 8/7, 2014 at 14:44 Comment(1)
Good one. David Arenburg fixed. I'm rather embarrassed that I did that and amazed the error persisted for over a year.Pannikin
P
6

It should be rather difficult to get a leading space in a column name. Should not happen by "casual coding". On the other hand I don't see very much error checking in the fread code, so maybe until this undesirable behavior is fixed, (or the feature request refused), you can do something like this:

setnames(DT, make.names(colnames(DT))) 

If on the other hand you are bothered by the fact that colnames(DT) will display the column names with quotes then just "get over it." That's how the interactive console will display any character value.

If you have a data item in a character column that looks like " ttt" in the original, then it's going to have leading spaces when imported, and you need to process it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of the several trim functions in various packages (such as 'gdata')

Pannikin answered 6/6, 2013 at 18:2 Comment(6)
I'll try the sub/gsub route. Thanks.Feretory
You may want to update your answer with this. Warning message: In names<-.data.table(*tmp*, value = value) : The colnames(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your colnames<- calls.Feretory
I created a workaround, i.e. using this function now. clean.names <- function(x) { y <- str_trim(colnames(x)) y <- gsub("\\s", ".", y) y <- str_replace_all(y, "[[:punct:]]", ".") y <- tolower(y) return(y) }Feretory
Maybe we get rid of the stringr package and then it's ready for an answer. Would be interesting to get a comment from @matthew-dowleFeretory
You seem to be reinventing make.names in base R. If the concern is that make.names will not handle leading or trailing spaces in names to your liking, then just pass it: sub("^\\s+|\\s+$", "", colnames(m)).Pannikin
Ah good point, didn't realise. However, make.names is quite general and includes upper case and underscores etc.Feretory
W
1

You can use argument check.names=T in fread function of data.table

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE, check.names=T)

It uses make.names function in background

default is FALSE. If TRUE then the names of the variables in the data.table 
are checked to ensure that they are syntactically valid variable names. If 
necessary they are adjusted (by make.names) so that they are, and also to 
ensure that there are no duplicates.
Wane answered 31/8, 2017 at 6:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.