Reading a file line by line in C++ can be done in some different ways.
[Fast] Loop with std::getline()
The simplest approach is to open an std::ifstream and loop using std::getline() calls. The code is clean and easy to understand.
#include <fstream>
std::ifstream file(FILENAME);
if (file.is_open()) {
std::string line;
while (std::getline(file, line)) {
// using printf() in all tests for consistency
printf("%s", line.c_str());
}
file.close();
}
[Fast] Use Boost's file_description_source
Another possibility is to use the Boost library, but the code gets a bit more verbose. The performance is quite similar to the code above (Loop with std::getline()).
#include <boost/iostreams/device/file_descriptor.hpp>
#include <boost/iostreams/stream.hpp>
#include <fcntl.h>
namespace io = boost::iostreams;
void readLineByLineBoost() {
int fdr = open(FILENAME, O_RDONLY);
if (fdr >= 0) {
io::file_descriptor_source fdDevice(fdr, io::file_descriptor_flags::close_handle);
io::stream <io::file_descriptor_source> in(fdDevice);
if (fdDevice.is_open()) {
std::string line;
while (std::getline(in, line)) {
// using printf() in all tests for consistency
printf("%s", line.c_str());
}
fdDevice.close();
}
}
}
[Fastest] Use C code
If performance is critical for your software, you may consider using the C language. This code can be 4-5 times faster than the C++ versions above, see benchmark below
FILE* fp = fopen(FILENAME, "r");
if (fp == NULL)
exit(EXIT_FAILURE);
char* line = NULL;
size_t len = 0;
while ((getline(&line, &len, fp)) != -1) {
// using printf() in all tests for consistency
printf("%s", line);
}
fclose(fp);
if (line)
free(line);
Benchmark -- Which one is faster?
I have done some performance benchmarks with the code above and the results are interesting. I have tested the code with ASCII files that contain 100,000 lines, 1,000,000 lines and 10,000,000 lines of text. Each line of text contains 10 words in average. The program is compiled with -O3
optimization and its output is forwarded to /dev/null
in order to remove the logging time variable from the measurement. Last, but not least, each piece of code logs each line with the printf()
function for consistency.
The results show the time (in ms) that each piece of code took to read the files.
The performance difference between the two C++ approaches is minimal and shouldn't make any difference in practice. The performance of the C code is what makes the benchmark impressive and can be a game changer in terms of speed.
10K lines 100K lines 1000K lines
Loop with std::getline() 105ms 894ms 9773ms
Boost code 106ms 968ms 9561ms
C code 23ms 243ms 2397ms