I face the challenge of reading/writing files (in Gigs) line by line.
Reading many forum entries and sites (including a bunch of SO's), mmap was suggested as the fastest option to read/write files. However, when I implement my code with both readline and mmap techniques, mmap is the slower of the two. This is true for both reading and writing. I have been testing with files ~600 MB large.
My implementations parse line by line and then tokenize the line. I will present file input only.
Here is the getline implementation:
void two(char* path) {
std::ios::sync_with_stdio(false);
ifstream pFile(path);
string mystring;
if (pFile.is_open()) {
while (getline(pFile,mystring)) {
// c style tokenizing
}
}
else perror("error opening file");
pFile.close();
}
and here is the mmap:
void four(char* path) {
int fd;
char *map;
char *FILEPATH = path;
unsigned long FILESIZE;
// find file size
FILE* fp = fopen(FILEPATH, "r");
fseek(fp, 0, SEEK_END);
FILESIZE = ftell(fp);
fseek(fp, 0, SEEK_SET);
fclose(fp);
fd = open(FILEPATH, O_RDONLY);
map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
/* Read the file char-by-char from the mmap
*/
char c;
stringstream ss;
for (long i = 0; i <= FILESIZE; ++i) {
c = map[i];
if (c != '\n') {
ss << c;
}
else {
// c style tokenizing
ss.str("");
}
}
if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");
close(fd);
}
I omitted much error checking in the interest of brevity.
Is my mmap implementation incorrect, and thus affecting performance? Perhaps mmap is non ideal for my application?
Thanks for any comments or help!
open
andmmap
there is no purpose served by usingfopen
,fseek
andftell
when you could simply usefstat
after theopen
. – Rochdale