mmap slower than getline?

Asked 11/7, 2011 at 21:29 Answered 11/7, 2011 at 21:54

I face the challenge of reading/writing files (in Gigs) line by line.

Reading many forum entries and sites (including a bunch of SO's), mmap was suggested as the fastest option to read/write files. However, when I implement my code with both readline and mmap techniques, mmap is the slower of the two. This is true for both reading and writing. I have been testing with files ~600 MB large.

My implementations parse line by line and then tokenize the line. I will present file input only.

Here is the getline implementation:

void two(char* path) {

    std::ios::sync_with_stdio(false);
    ifstream pFile(path);
    string mystring;

    if (pFile.is_open()) {
        while (getline(pFile,mystring)) {
            // c style tokenizing
        }
    }
    else perror("error opening file");
    pFile.close();
}

and here is the mmap:

void four(char* path) {

    int fd;
    char *map;
    char *FILEPATH = path;
    unsigned long FILESIZE;

    // find file size
    FILE* fp = fopen(FILEPATH, "r");
    fseek(fp, 0, SEEK_END);
    FILESIZE = ftell(fp);
    fseek(fp, 0, SEEK_SET);
    fclose(fp);

    fd = open(FILEPATH, O_RDONLY);

    map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);

    /* Read the file char-by-char from the mmap
     */
    char c;
    stringstream ss;

    for (long i = 0; i <= FILESIZE; ++i) {
        c = map[i];
        if (c != '\n') {
            ss << c;
        }
        else {
            // c style tokenizing
            ss.str("");
        }

    }

    if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");

    close(fd);

}

I omitted much error checking in the interest of brevity.

Is my mmap implementation incorrect, and thus affecting performance? Perhaps mmap is non ideal for my application?

Thanks for any comments or help!

Vevina answered 11/7, 2011 at 21:29 Comment(2)

Since you are using open and mmap there is no purpose served by using fopen, fseek and ftell when you could simply use fstat after the open. – Rochdale 11/7, 2011 at 21:50

good point @Zan. Actually i ran some tests, and my version is actually faster for files under ~50MB (interestingly enough). However, above ~50, the opposite is true. Hence for my application, I should really be using the fstat way. – Vevina 12/7, 2011 at 15:17

The real power of mmap is being able to freely seek in a file, use its contents directly as a pointer, and avoid the overhead of copying data from kernel cache memory to userspace. However, your code sample is not taking advantage of this.

In your loop, you scan the buffer one character at a time, appending to a stringstream. The stringstream doesn't know how long the string is, and so has to reallocate several times in the process. At this point you've killed off any performance increase from using mmap - even the standard getline implementation avoids multiple reallocations (by using a 128-byte on-stack buffer, in the GNU C++ implementation).

If you want to use mmap to its fullest power:

Don't copy your strings. At all. Instead, copy around pointers right into the mmap buffer.
Use built-in functions such as strnchr or memchr to find newlines; these make use of hand-rolled assembler and other optimizations to run faster than most open-coded search loops.

Gabble answered 11/7, 2011 at 21:32 Comment(8)

so what you're saying is that I need to better implement parsing each line? – Vevina 11/7, 2011 at 21:36

I've since reimplemented my mmap function using your recommended strchr and mainly pointer manipulation. mmap is now winning the race, however, only by ~4 sec with a 375 mb file.. – Vevina 11/7, 2011 at 22:37

@Ian, yes, it could well be that your bottleneck is elsewhere. You are testing this with a warm cache, right? – Gabble 11/7, 2011 at 22:39

Make sure the data is in memory before you run your benchmarks, in other words - otherwise your first run will be slow, but later runs will be fast. One simple way to do this is just to run your benchmark several times in a loop and take the fastest run. – Gabble 11/7, 2011 at 23:2

@bdonlan: On files in the GB range, there's no such thing as a "warm cache". – Barde 12/7, 2011 at 2:21

Sure there is, provided you have enough RAM :) But the running-in-a-loop bit still applies; it should help even the field, so to speak. Although I guess median may be preferred... – Gabble 12/7, 2011 at 3:0

ok i've really optimized both methods, reducing memory allocations, etc. With a 375 MB File at 10M lines i'm seeing only an 11% speed increase of the mmap over the newline. My overall times are ~ 0.9 sec and 0.8 sec. The largest file I will see will be ~24 times larger. This tiny speed increase doesn't really warrant the huge use of memory by mmap. – Vevina 12/7, 2011 at 18:44

@Ian, mmap doesn't use more memory than regular reads, it just accounts for it differently. See #1973265 – Gabble 12/7, 2011 at 18:58

Whoever told you to use mmap does not know very much about modern machines.

The performance advantages of mmap are a total myth. In the words of Linus Torvalds:

Yes, memory is "slow", but dammit, so is mmap().

The problem with mmap is that every time you touch a page in the mapped region for the first time, it traps into the kernel and actually maps the page into your address space, playing havoc with the TLB.

Try a simple benchmark reading a big file 8K at a time usingread and then again with mmap. (Using the same 8K buffer over and over.) You will almost certainly find that read is actually faster.

Your problem was never with getting data out of the kernel; it was with how you handle the data after that. Minimize the work you are doing character-at-a-time; just scan to find the newline and then do a single operation on the block. Personally, I would go back to the read implementation, using (and re-using) a buffer that fits in the L1 cache (8K or so).

Or at least, I would try a simple read vs. mmap benchmark to see which is actually faster on your platform.

[Update]

I found a couple more sets of commentary from Mr. Torvalds:

http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0775.html

The summary:

And on top of that you still have the actual CPU TLB miss costs etc. Which can often be avoided if you just re-read into the same area instead of being excessively clever with memory management just to avoid a copy.

memcpy() (ie "read()" in this case) is always going to be faster in many cases, just because it avoids all the extra complexity. While mmap() is going to be faster in other cases.

In my experience, reading and processing a large file sequentially is one of the "many cases" where using (and re-using) a modest-sized buffer with read/write performs significantly better than mmap.

Northing answered 11/7, 2011 at 21:54 Comment(1)

testing read() was on my next to do list. However, i'm uncertain how much optimization will be realized not knowing how many lines are actually larger than 8k. Thanks for the tip; I will certainly try it. – Vevina 11/7, 2011 at 22:22

You can use memchr to find line endings. It will be much faster than adding to a stringstream one character at a time.

Barde answered 11/7, 2011 at 21:34 Comment(0)

You're using stringstreams to store the lines you identify. This is not comparable with the getline implementation, the stringstream itself adds overhead. As other suggested, you can store the beginning of the string as a char*, and maybe the length of the line (or a pointer to the end of the line). The body of the read would be something like:

char* str_start = map;
char* str_end;
for (long i = 0; i <= FILESIZE; ++i) {
        if (map[i] == '\n') {
            str_end = map + i;
            {
                // C style tokenizing of the string str_start to str_end
                // If you want, you can build a std::string like:
                // std::string line(str_start,str_end);
                // but note that this implies a memory copy.
            }
            str_start = map + i + 1;
        }
    }

Note also that this is much more efficient because you don't process anything in each char (in your version you were adding the character to the stringstream).

Kleinstein answered 11/7, 2011 at 21:48 Comment(0)

Recommended topics

Hot tags