R read.csv Importing Column Names Incorrectly
Asked Answered
M

2

8

I have a csv that I would like to import into R as a data.frame. This csv has headers such as USD.ZeroCouponBondPrice(1m) and USD-EQ-SP500 that I can't change. When I try to import it into R, however, R's read.csv function overwrites the characters ()- as . Although I wasn't able to find a way to fix this in the function documentation, this line of code worked:

colnames(df)<-c('USD.ZeroCouponBondPrice(1m)', 'USD-EQ-SP500')

so those characters are legal in data.frame column names. Overwriting all of the column names is annoying and fragile as there are over 20 of them and it is not unthinkable for them to change. Is there a way to prevent read.csv from replacing those characters, or an alternative function to use?

Misapprehension answered 18/10, 2017 at 16:21 Comment(1)
I'm not sure how, but possible you could make some hack using Tibbles. With Tibbles you can use crazy name for the names of variables.Kunstlied
M
17

If you set the argument

check.names = FALSE

in read.csv, then R will not override the names. But these names are not valid in R and they'll have to be handled differently than valid names.

Mycostatin answered 18/10, 2017 at 16:30 Comment(1)
One example of "handled differently" is if you are using $ notation to reference a variable you will need backticks around the variable name, e.g. df$`USD.ZeroCouponBondPrice(1m)`.Lasting
K
-2

Illustrating a possible Tibbles solution utilizing Kelli-Jean's answer on how to use check.names = FALSE

# install.packages(c("tidyverse"), dependencies = TRUE)
library(tibble)
dta <- url("http://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv")
TBdta <- as_tibble(read.csv(dta, check.names = FALSE)) 
TBdta
#> # A tibble: 6 x 3
#>   USD.ZeroCouponBondPrice(1m) USD-EQ-SP500 crazy name
#>                        <fctr>        <dbl>      <int>
#> 1                           A         10.0         12
#> 2                           A         11.0         14
#> 3                           B          5.0          8
#> 4                           B          6.0         10
#> 5                           A         10.5         13
#> 6                           B          7.0         11

Be sure to read this introduction to Tibbles as they do behave somewhat different from regular data frames.

In case someone need to use https

temporaryFile <- tempfile()
download.file("https://s3.amazonaws.com/csvpastebin/uploads/a4c665743904ea8f18dd1f31edcbae04/crazy_names.csv", destfile = temporaryFile, method="curl")
TBdta2 <- as_tibble(read.csv(temporaryFile, check.names = F)) 
Kunstlied answered 18/10, 2017 at 16:47 Comment(3)
You can use invalid names for variables in a native data frame, as the result of read.csv(dta, check.names = FALSE) shows. The only difference I see with tibbles is it doesn't automatically convert names when you use the tibble() function to create one. I don't see any added benefit to wrapping as_tibble() around read.csv(), at least as far as the OP's question goes.Lasting
@BrianStamper I appreciate your feedback.Kunstlied
I accepted @Kelli-Jean's answer because it was easier to implement as a solution, but I found this answer helpful as a legitimate alternative. I didn't specify that I wanted an answer that uses only R's base packages, so I don't think this answer deserves the down vote (not sure if it was you).Misapprehension

© 2022 - 2024 — McMap. All rights reserved.