I have a large file (bigger then RAM, can't read whole at once) and i need to process it row by row (in c++). I want to utilize multiple cores, preferably with Intel TBB or Microsoft PPL. I would rather avoid preprocessing this file (like splitting it to 4 parts etc).
I was thinking about something like using 4 iterators, initialized to (0, n/4, 2*n/4 3*n/4) positions in the file etc.
Is it good solution and is there simple way to achieve it?
Or maybe you know some libs that supports efficient, concurrent reading of streams?
update:
I did tests. IO is not the bottleneck, CPU is. And I have lot of RAM for buffers.
I need to parse record (var size, approx. 2000 bytes each, records are separated by unique '\0' char), validate it, do some calculations, and write result to another file(s)
(0, n/4, 2*n/4, 3*n/4) + i
will include at least four disk seeks, and I/O might become the bottleneck. – Cupola