Read multiple gzip files into a single data.table using fread (and data connections) [closed]
Asked Answered
P

0

10

I was looking at this thread: 'append multiple large data.table's; custom data coercion using colClasses and fread; named pipes'

I see from "Matt Dowle", that fread "can accept non-files such as http addresses and connections". I tried passing a gzip connection in the past without success. Does anyone have an example showing how one could read a gzip file with fread without needing to decompress it locally or using pipes?

Right now, I decompress the network files locally read them using fread and append them to the other data already read using rbindlist. Howerver, I think there might be a faster way to achieve this.

Also, following the original question from "James", it would be great in the proposal to open and concatenate multiple files, if support for gzip files (or files compressed with other algorithm) was provided. Perhaps by allowing the user to pass fread:

  1. an array of gzip connections, or
  2. an array of files and some info about the type of file provided (or what connection type to use), or
  3. an array of files and by recognizing automatically if the file(s) is(are) compressed using gzip or other format, or
  4. combinations of points 1, 2 & 3

This might already in place and I hope someone can pass me some example code or point me to the right direction. I looked into the data.frame R-Forge project and submitting this as a request/bug, but I couldn't do so (hope no one takes offense if if I post this here).

Finally, does anyone know in R if it is possible to read a file into RAM and pass a handle to this virtual file, without needing to use RAM disks etc.?

I hope someone can help me improve the performance of my code, which aims at reading a thousand gzip files located on our network, which have possibly different columns of data (ie. not all the files will have the same columns, but all them do have at least some degree of overlap). The total size of these files is about ~10Gb.

Pricecutting answered 28/9, 2014 at 5:43 Comment(1)
This is #717. You can use system commands inside fread(), but have to unzip. But what you've asked would be very useful and should be implemented at some point.Voice

© 2022 - 2024 — McMap. All rights reserved.