I am currently working a project where I have a large text file (15+ GB) and I'm trying to run a function on each line of the file. In order to speed the task along, I am creating 4 threads and attempting to have them read the file at the same time. This is similar to what I have:
#include <stdio.h>
#include <string>
#include <iostream>
#include <stdlib.h>
#include <thread>
#include <fstream>
void simpleFunction(*wordlist){
string word;
getline(*wordlist, word);
cout << word << endl;
}
int main(){
int max_concurrant_threads = 4;
ifstream wordlist("filename.txt");
thread all_threads[max_concurrant_threads];
for(int i = 0; i < max_concurrant_threads; i++){
all_threads[i] = thread(simpleFunction,&wordlist);
}
for (int i = 0; i < max_concurrant_threads; ++i) {
all_threads[i].join();
}
return 0;
}
The getline()
function (along with *wordlist >> word
) seems to increment the pointer and read the value in 2 steps, as I will regularly get:
Item1 Item2 Item3 Item2
back.
So I was wondering if there was a way to atomically read a line of the file? Loading it into an array first won't work because the file is too big, and I would prefer not to load the file in chunks at a time.
I couldn't find anything regarding fstream
and the atomicity of getline()
, sadly. If there is an atomic version of readline()
or even a simple way to use locks to achieve what I want, I'm all ears.
read
syscalls. However, it isn't the right way to do that: You should give your threads a line to process; then you don't have a shared resource. – Synchronizegetline()
is not atomic. – Inkberry