fread to read top n rows from a large file
Asked Answered
C

2

6

I am getting below error when reading first n rows from a big file(around 50 GB) using fread. Looks like a memory issue. I tried to use nrows=1000 . But no luck. Using linux

file ok but could not memory map it. This is a 64bit process. There is probably not enough contiguous virtual memory available.

Can this below code be replaced with read.csv with all options as used below? Does it help?

  rdata<- fread(
      file=csvfile, sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
    , nrows= 500
  )
Colossal answered 25/9, 2018 at 7:39 Comment(4)
What if you replace csvfile with paste('head -n 500', csvfile)?Ingalls
@Ingalls : got an error File 'head -n 500 /csvfile' doesnt existColossal
The argument should finally looks like input = "head -n 500 /path/to/csvfile". Please use the input argument rather than file argument to allow shell commands. I do not have a file that large to test. I hope this works.Ingalls
@Ingalls : thats awsome. when used with input it works!.. You should put this as answerColossal
I
4

Another workaround is to fetch the first 500 lines with shell command:

rdata<- fread(
    cmd = paste('head -n 500', csvfile),
    sep= "|", header=FALSE, col.names= colsinfile,
    select= colstoselect, key = "keycolname", na.strings= c("", "NA")
)

I don't known why nrows doesn't work, though.

Ingalls answered 25/9, 2018 at 11:18 Comment(1)
Newer versions (somewhere between 3.4.0 and 3.6.0) recommend cmd = instead of input =.Certify
P
1

Perhaps this would help you:

processFile = function(filepath) {
con = file(filepath, "r")
while ( TRUE ) {
line = readLines(con, n = 1)
if ( length(line) == 0 ) {
  break
}
print(line)
}
close(con)
}

see reading a text file in R line by line.. In your case you'd probably want to replace the while ( TRUE ) by for(i in 1:1000)

Pentad answered 25/9, 2018 at 7:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.