I have an input_file.fa file like this (FASTA format):
> header1 description
data data
data
>header2 description
more data
data
data
I want to read in the file one chunk at a time, so that each chunk contains one header and the corresponding data, e.g. block 1:
> header1 description
data data
data
Of course I could just read in the file like this and split:
with open("1.fa") as f:
for block in f.read().split(">"):
pass
But I want to avoid the reading the whole file into memory, because the files are often large.
I can read in the file line by line of course:
with open("input_file.fa") as f:
for line in f:
pass
But ideally what I want is something like this:
with open("input_file.fa", newline=">") as f:
for block in f:
pass
But I get an error:
ValueError: illegal newline value: >
I've also tried using the csv module, but with no success.
I did find this post from 3 years ago, which provides a generator based solution to this issue, but it doesn't seem that compact, is this really the only/best solution? It would be neat if it is possible to create the generator with a single line rather than a separate function, something like this pseudocode:
with open("input_file.fa") as f:
blocks = magic_generator_split_by_>
for block in blocks:
pass
If this is impossible, then I guess you could consider my question a duplicate of the other post, but if that is so, I hope people can explain to me why the other solution is the only one. Many thanks.