Skipping last column in r with read.csv
Asked Answered
B

4

5

I was on that post read.csv and skip last column in R but did not find my answer, and try to check directly in Answer ... but that's not the right way (thanks mjuarez for taking the time to get me back on track.

The original question was:

I have read several other posts about how to import csv files with read.csv but skipping specific columns. However, all the examples I have found had very few columns, and so it was easy to do something like:

 columnHeaders <- c("column1", "column2", "column_to_skip")
 columnClasses <- c("numeric", "numeric", "NULL")
 data <- read.csv(fileCSV, header = FALSE, sep = ",", col.names = 
 columnHeaders, colClasses = columnClasses)

All answer were good, but does not work for what I entended to do. So I asked my self and other:

And in one function, does data <- read_csv(fileCSV)[,(ncol(data)-1)] could work?

I've tried in one line of R to get on data, all 5 of first 6 columns, so not the last one. To do so, I would like to use "-" in the number of column, do you think it's possible? How can I do that?

Thanks!

Blake answered 3/2, 2018 at 13:25 Comment(1)
Related: Only read limited number of columnsNadinenadir
R
4

In base r it has to be 2 steps operation. Example:

> data <- read.csv("test12.csv")
> data
# 3 columns are returned
          a b c
1 1/02/2015 1 3
2 2/03/2015 2 4

# last column is excluded 
> data[,-ncol(data)]
          a b
1 1/02/2015 1
2 2/03/2015 2

one cannot write data <- read.csv("test12.csv")[,-ncol(data)] in base r.

But if you know max number of columns in your csv (say 3 in my case) then one can write:

df <- read.csv("test12.csv")[,-3]
df
          a b
1 1/02/2015 1
2 2/03/2015 2
Refectory answered 3/2, 2018 at 13:38 Comment(2)
I've tried it, and it works. Now I would like to go further, and give each column a type df <- read.csv("test12.csv", col_types = "ccd")[,-3]. Not sure if it will work, or do I have to not put in col_types the column I want to exclude?Blake
The read.csv argument is called colClasses. Carefully read ?read.csv for more info. Note that you don't need the comma in [, -3] although it won't hurt.Duplicature
D
2

The right hand side of an assignment is processed first so this line from the question:

data <- read.csv(fileCSV)[,(ncol(data)-1)]

is trying to use data before it is defined. Also note what the above is saying is to take only the 2nd last field. To get all but the last field:

data <- read.csv(fileCSV)
data <- data[-ncol(data)]

If you know the name of the last field, say it is lastField, then this works and unlike the code above does not read the whole file and then remove the last field but rather only reads in fields other than the last. Also it is only one line of code.

read.csv(fileCSV, colClasses = c(lastField = "NULL"))

If you don't know the name of the last field but you do know how many fields there are, say n, then either of these would work:

read.csv(fileCSV)[-n]

read.csv(fileCSV, colClasses = replace(rep(NA, n), n, "NULL"))

Another way to do it without first reading in the last field is to first read in the header and first line to calculate the number of fields (assuming that all records have the same number) and then re-read the file using that.

n <- ncol(read.csv(fileCSV, nrows = 1))

making use of one of the prior two statements involving n.

Duplicature answered 3/2, 2018 at 13:55 Comment(1)
Thanks @Grothendieck for your comment, I'll use al this knowledge to go further!Blake
F
1

It's not possible in one line as the data variable is not yet initialized when you call it. So the command ncol(data) will trigger an error.

You would need to use two lines of code to first load your data into the data variable and then remove the last column by either using data[,-ncol(data)] or data[,1:(ncol(data)-1)].

Fantom answered 3/2, 2018 at 13:46 Comment(1)
Thanks for the advice and the additional information @torobergerBlake
O
1

Not a single function, but at least a single line, using dplyr (disclaimer: I never use dplyr or magrittr, so a more optimized solution must exist using these libraries)

library(dplyr)
dat = read.table(fileCSV) %>% select(., which(names(.) != names(.)[ncol(.)]))
Overexcite answered 3/2, 2018 at 15:30 Comment(2)
Thanks, never though using magrittr or dyplr to to that!Blake
@ArthurCamberlein You're welcome. While you already accepted an answer (it's usually recommended to wait a bit, but it's totally up to you which answer was the most helpful for you), I see you didn't upvote the other useful answers (as you thanked their respective authors). So consider upvoting any answer which was helpful now that you have the necessary rep to do that :)Overexcite

© 2022 - 2024 — McMap. All rights reserved.