c++ reading file is too slow
Asked Answered
S

3

5

I'm trying to to read ~36KB and it would take ~20 seconds to finish this loop:

ifstream input_file;

input_file.open("text.txt");
if( !(input_file.is_open()) )
{
    cout<<"File not found";
    exit(1);
}

std::string line;
stringstream line_stream;   //to use << operator to get words from lines

int lineNum=1;

while( getline(input_file,line) )   //Read file line by line until file ends
{
    line_stream.clear();    //clear stream
    line_stream << line;    //read line
    while(line_stream >> word)  //Read the line word by word until the line ends
    {
        //insert word into a linked list...
    }
    lineNum++;
}
input_file.close();

Any help would be appreciated.

Secund answered 14/4, 2016 at 4:10 Comment(2)
It might, in fact, be your insertion into the linked list that is the problem. It could be O(n^2), depending on how it is implemented. And with 36kB, "n" could be big.Lemos
you are correct! I commented the insertion part and the code ended at the moment... I'll look for the problem now. thanks :)Secund
D
7

stringstream::clear() does not clear all context inside it. It only resets the error and EOF flags, see http://en.cppreference.com/w/cpp/io/basic_ios/clear.

The result is your line_stream accumulates all previous lines and the inner loop will run words over all the accumulated lines again and again.

So the total time you spend is about O(n^2) compared to O(n) of what you expect it to be.

Instead of using the same object across each line, you could define the new line_stream instance inside the while loop to have a brand new and also empty one. Like this:

fstream input_file;

input_file.open("text.txt");
if( !(input_file.is_open()) )
{
    cout<<"File not found";
    exit(1);
}

std::string line;

int lineNum=1;

while( getline(input_file,line) )   //Read file line by line until file ends
{
    stringstream line_stream;   // new instance, empty line.
    line_stream << line;    //read line
    while(line_stream >> word)  //Read the line word by word until the line ends
    {
        //insert word into a linked list...
    }
    lineNum++;
}
input_file.close();
Despairing answered 14/4, 2016 at 4:22 Comment(5)
another clean way to reset a string stream: line_stream.str(std::string());Malayopolynesian
So, you could just reset the stream with line_stream.str(line); and skip the << insertion.Lemos
Thank you, but It reduced the time by ~4 seconds only,.. the whole code takes ~24 seconds.. this one takes ~20...Secund
At that point it might be an issue where your storage bus is the problem? Do you have a lot of other I/O going on? (And even though the file is small, have you checked that it is concurrent on the drive?)Southbound
Yeah I figured out that the class insertion member function is the one taking a lot of time.Secund
T
0

You could attempt the following:

std::ifstream file("text.txt");
std::string str;

while (std::getline(file, str))
{
    cout << str; //call function to to retrieve words of str in memory not in file 
}

I ran your code in 11ms, but with the mentioned option in 8ms. May be it works for you.

Tillie answered 14/4, 2016 at 5:7 Comment(0)
M
0

Try compiling with build flag -O2 or -O3.

I was surprised to see that a simple for-loop to read a 1GB file took 4.7 seconds, whereas another higher level language (Dart) did it in 3.x seconds.

After enabling this flag, runtime dropped to 2.1 seconds.

Mainsheet answered 23/9, 2020 at 12:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.