R - read.table imports half of the dataset - no errors nor warnings
Asked Answered
I

3

4

I have a csv file with ~200 columns and ~170K rows. The data has been extensively groomed and I know that it is well-formed. When read.table completes, I see that approximately half of the rows have been imported. There are no warnings nor errors. I set options( warn = 2 ). I'm using 64-bit latest version and I increased the memory limit to 10gig. Scratching my head here...no idea how to proceed debugging this.

Edit
When I said half the file, I don't mean the first half. The last observation read is towards the end of the file....so its seemingly random.

Infuscate answered 16/4, 2011 at 4:43 Comment(3)
Is it just getting the first half of the file, or seemingly lines at random? You don't say, and its important.Yurt
@Yurt - great question! Seemingly random now that I checked. The last observation in memory is towards the bottom of the file.Infuscate
Looks like you've got your answer now. First thing I might have done is taken the first few lines to see which ones would be read or not. Always worth getting something working on a small dataset before a huge one!Yurt
B
11

You may have a comment character (#) in the file (try setting the option comment.char = "" in read.table). Also, check that the quote option is set correctly.

Bakerman answered 16/4, 2011 at 6:39 Comment(2)
I had had comment.char = "", but as soon as I set quote="" I was able to read all observations!Infuscate
That did it for me. Weird & worrying! Any idea what is going on?Plight
B
2

I've had this problem before how I approached it was to read in a set number of lines at a time and then combine after the fact.

df1 <- read.csv(..., nrows=85000) 
df2 <- read.csv(..., skip=84999, nrows=85000) 
colnames(df1) <- colnames(df2)

df <- rbind(df1,df2) 
rm(df1,df2)
Breger answered 16/4, 2011 at 6:7 Comment(0)
N
1

I had a similar problem when reading in a large txt file which had a "|" separator. Scattered about the txt file were some text blocks that contained a quote (") which caused the read.xxx function to stop at the prior record without throwing an error. Note that the text blocks mentioned were not encased in double quotes; rather, they just contained one double quote character here and there (") which tripped it up.

I did a global search and replace on the txt file, replacing the double quote (") with a single quote ('), solving the problem (all rows were then read in without aborting).

Nonconductor answered 5/4, 2012 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.