Why fsync() takes much more time on Linux kernel 3.1.* than kernel 3.0
Asked Answered
A

2

7

I have a test program. It takes about 37 seconds on Linux kernel 3.1.*, but only takes about 1 seconds on kernel 3.0.18 (I just replace the kernel on the same machine as before). Please give me a clue on how to improve it on kernel 3.1. Thanks!

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>


int my_fsync(int fd)
{
    // return fdatasync(fd);
    return fsync(fd);
}


int main(int argc, char **argv)
{
    int rc = 0;
    int count;
    int i;
    char oldpath[1024];
    char newpath[1024];
    char *writebuffer = calloc(1024, 1);

    snprintf(oldpath, sizeof(oldpath), "./%s", "foo");
    snprintf(newpath, sizeof(newpath), "./%s", "foo.new");

    for (count = 0; count < 1000; ++count) {
    int fd = open(newpath, O_CREAT | O_TRUNC | O_WRONLY, S_IRWXU);
    if (fd == -1) {
        fprintf(stderr, "open error! path: %s\n", newpath);
        exit(1);
    }

    for (i = 0; i < 10; i++) {
        rc = write(fd, writebuffer, 1024);
        if (rc != 1024) {
        fprintf(stderr, "underwrite!\n");
        exit(1);
        }
    }

    if (my_fsync(fd)) {
        perror("fsync failed!\n");
        exit(1);
    }

    if (close(fd)) {
        perror("close failed!\n");
        exit(1);
    }

    if (rename(newpath, oldpath)) {
        perror("rename failed!\n");
        exit(1);
    }

    }

    return 0;
}


# strace -c ./testfsync
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.58    0.068004          68      1000           fsync
  0.84    0.000577           0     10001           write
  0.40    0.000275           0      1000           rename
  0.19    0.000129           0      1003           open
  0.00    0.000000           0         1           read
  0.00    0.000000           0      1003           close
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         2           setitimer
  0.00    0.000000           0        68           sigreturn
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           mprotect
  0.00    0.000000           0         2           writev
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         6           mmap2
  0.00    0.000000           0         2           fstat64
  0.00    0.000000           0         1           set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00    0.068985                 14099         1 total
Arrio answered 16/1, 2012 at 4:16 Comment(3)
How do you know it's fsync() that causes the slowdown? Could be the open() call as well.Repress
What mount options are you using on the filesystems in question? (Incidentally, it takes about 1.5 sec on my 2.6.38-12-generic kernel from Ubuntu; ext3 filesystem, rw,errors=remount-ro,commit=0.)Apo
strace -c ./a.out will summarize the execution time of the different system calls.Apo
A
8

Kernel 3.1.* is actually doing the sync, 3.0.18 is faking it. Your code does 1,000 synchronized writes. Since you truncate the file, each write also enlarges the file. So you actually have 2,000 write operations. Typical hard drive write latency is about 20 milliseconds per I/O. So 2,000*20 = 40,000 milliseconds or 40 seconds. So it seems about right, assuming you're writing to a typical hard drive.

Basically, by syncing after each write, you give the kernel no ability to efficiently cache or overlap the writes and force worst-case behavior on every operation. Also, the hard drive winds up having to seek back and forth between where the data is written and where the metadata is written once for each write.

Annabelleannabergite answered 16/1, 2012 at 4:25 Comment(0)
A
2

Found the reason. File system barriers enabled by default in ext3 for Linux kernel 3.1 (http://kernelnewbies.org/Linux_3.1). After disable barriers, it becomes much faster.

Arrio answered 17/1, 2012 at 0:12 Comment(1)
You might also be causing a performance issue because of the unusual I/O pattern of your test. Try 1000 different files, for example, possibly interspersed with reads. I'm not sure disabling fs barriers is a good solution, if you are in fact looking for more secure retention of data (for lack of a better term).Transform

© 2022 - 2024 — McMap. All rights reserved.