Is there a way to read and write in-memory files in R?
Asked Answered
S

4

7

I am trying to use R to analyze large DNA sequence files (fastq files, several gigabytes each), but the standard R interface to these files (ShortRead) has to read the entire file at once. This doesn't fit in memory, so it causes an error. Is there any way that I can read a few (thousand) lines at a time, stuff them into an in-memory file, and then use ShortRead to read from that in-memory file?

I'm looking for something like Perl's IO::Scalar, for R.

Steffy answered 8/11, 2010 at 16:40 Comment(2)
Actually, I don't think I can solve my problem with this: the function in question (readFastq) wants a file name, so I'm not sure that I can pass an arbitrary connection instead.Steffy
I think what you're looking for is described in answers to this post: #1728272 I especially like the sqldf solution.Euphony
S
2

It looks like ShortRead is soon to add a "FastqStreamer" class that does what I want.

Steffy answered 21/6, 2012 at 19:1 Comment(0)
A
2

I don’t know much about R, but have you had a look at the mmap package?

Archlute answered 8/11, 2010 at 16:42 Comment(0)
S
2

It looks like ShortRead is soon to add a "FastqStreamer" class that does what I want.

Steffy answered 21/6, 2012 at 19:1 Comment(0)
F
1

Well, I don't know about readFastq accepting something other than a file...

But if it can, for other functions, you can use the R function pipe() to open a unix connection, then you could do this with a combination of unix commands head and tail and some pipes.

For example, to get lines 90 to 100, you use this:

head file.txt -n 100 | tail -n 10

So you can just read the file in chunks.

If you have to, you can always use these unix utilities to create a temporary file, then read that in with shortRead. It's a pain but if it can only take a file, at least it works.

Ferroconcrete answered 15/9, 2011 at 14:14 Comment(0)
S
1

Incidentally, the answer to generally how to do an in-memory file in R (like Perl's IO::Scalar) is the textConnection function. Sadly though, the ShortRead package cannot handle textConnection objects as inputs, so while the idea that I expressed in the question of reading a file in small chunks into in-memory files which are then parsed bit by bit is certainly possible for many applications, but not for may particular application since ShortRead does not like textConnections. So the solution is the FastqStreamer class described above.

Steffy answered 22/6, 2012 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.