Why is this C++ program slower on Windows than Linux?
Asked Answered
I

3

7

Consider the following program:

#define _FILE_OFFSET_BITS 64   // Allow large files.
#define REVISION "POSIX Revision #9"

#include <iostream>
#include <cstdio>
#include <ctime>

const int block_size = 1024 * 1024;
const char block[block_size] = {};

int main()
{
    std::cout << REVISION << std::endl;  

    std::time_t t0 = time(NULL);

    std::cout << "Open: 'BigFile.bin'" << std::endl;
    FILE * file;
    file = fopen("BigFile.bin", "wb");
    if (file != NULL)
    {
        std::cout << "Opened. Writing..." << std::endl;
        for (int n=0; n<4096; n++)
        {
            size_t written = fwrite(block, 1, block_size, file);
            if (written != block_size)
            {
                std::cout << "Write error." << std::endl;
                return 1;
            }
        }
        fclose(file);
        std::cout << "Success." << std::endl;

        time_t t1 = time(NULL);
        if (t0 == ((time_t)-1) || t1 == ((time_t)-1))
        {
            std::cout << "Clock error." << std::endl;
            return 2;
        }

        double ticks = (double)(t1 - t0);
        std::cout << "Seconds: " << ticks << std::endl;

        file = fopen("BigFile.log", "w");
        fprintf(file, REVISION);
        fprintf(file, "   Seconds: %f\n", ticks);
        fclose(file);

        return 0;
    }

    std::cout << "Something went wrong." << std::endl;
    return 1;
}

It simply writes 4GB of zeros to a file on disk and times how long it took.

Under Linux, this takes 148 seconds on average. Under Windows, on the same PC, it takes on average 247 seconds.

What the hell am I doing wrong?!

The code is compiled under GCC for Linux, and Visual Studio for Windows, but I cannot imagine a universe in which the compiler used should make any measurable difference to a pure I/O benchmark. The filesystem used in all cases is NTFS.

I just don't understand why such a vast performance difference exists. I don't know why Windows is running so slow. How do I force Windows to run at the full speed that the disk is clearly capable of?

(The numbers above are for OpenSUSE 13.1 32-bit and Windows XP 32-bit on an old Dell laptop. But I've observed similar speed differences on several PCs around the office, running various versions of Windows.)

Edit: The executable and the file it writes both reside on an external USB harddisk which is formatted as NTFS and is nearly completely empty. Fragmentation is almost certainly not a problem. It could be some kind of driver issue, but I've seen the same performance difference on several other systems running different versions of Windows. There is no antivirus installed.

Just for giggles, I tried changing it to use the Win32 API directly. (Obviously this only works for Windows.) Time becomes a little more erratic, but still within a few percent of what it was before. Unless I specify FILE_FLAG_WRITE_THROUGH; then it goes significantly slower. A few other flags make it slower, but I can't find the one that makes it go faster...

Intinction answered 21/10, 2015 at 9:18 Comment(25)
Could you tell us the file system, please? I guess ext4 and NTFS but better to be sure.Unsearchable
What happens if you change the size of the block?Cognizance
Also, give the exact optimization options given to respective compilers.Astatic
@cad Good point! I meant to include that...Intinction
@Cognizance I found that sizes larger than 1MB cause a small decrease in speed. 1MB is the fastest I've tried. I haven't actually tried going smaller than 1MB...Intinction
I would guess this may have something to do with the file system's write caching.Cognizance
@Intinction For laughs, why not try gcc for Windows? Bottom line is that fwrite is still compiler dependent.Giselegisella
How much RAM do you have installed? Maybe Linux is caching more in memory?Pilatus
Just tried the program on ArchLinux 64-bit, ran 39 seconds. :D I think Linux employs a stronger cache mechanism. It only flushes what absolutely has to be flushed.Unsearchable
Perhaps xp is running without appropriate drivers thus the transfer to disk is slower. Or an AntiVirus interfers. Or Linux is buffering in RAM before write and Windows flushes first, ... And so on and so on ... No matter what: everything below 70 Mb/s consecutive write speed is slow for any disk not coming from the stoneageAnabolism
Several factors may impact the measure: at least libcs may use different cache sizes and systems may use different caching systems. These usually severely impact I/O performance.Nationwide
@Intinction IMO timing disk access, unless both systems are exactly the same, is so volatile that it isn't worth spending too much time on. I have had programs run vastly different times using the same compiler and optimizations, let alone a different compiler.Giselegisella
I assume "same PC" meant dual boot, not identical machines. Assuming a hard drive (not SSD), there is a very big performance difference depending on the physical position of the partition on the drive. Earlier physical positions are much faster than later physical positions.Carillon
Getting good disk write speeds critically depends on the exact location of the file (outer tracks are best), the fragmentation of the volume (causes disk seeks) and the amount of RAM available to the file system cache. Which is surely not enough on an "old Dell laptop". Just the file location is already enough to explain the speed difference. Defragging the disk is a manual maintenance task on XP.Nepos
No fsync? What is it exactly that you're trying to measure?Kickapoo
@Kickapoo Doesn't closing the file sync it anyway?Intinction
@Intinction No, it does not. #15348931Kickapoo
This is a great example that portability of portable code is not without limits. See also Edouard Alligand's cppcon 2014 talk for more examples of what can go wrong when using the same code on Windows and Linux (he also covers file I/O in the very beginning of the talk).Unquestioning
@Kickapoo To make sure I understand: Are you saying that if I close without synching, the data is gone? Or merely that it may not have been written to disk yet? (I.e., it will be written at some point, baring a system failure.)Intinction
@Intinction it has not necessarily been written to disk - it may have been cached by the OS to be written later, some time after your program terminates. It doesn't just vanish.Kickapoo
@Kickapoo OK, that's what I was hoping. That means my application can be writing the next file while the OS is still saving the previous one. (I.e., it's a useful performance optimisation.)Intinction
This being a simple caching-then-writing difference seems the most plausible explanation in my opinion, especially when writing to a USB drive.Underfoot
@Intinction I can imagine a compiler could make a big difference in IO performance particularly if you do output to the console (which is synchronized on all platforms, but is done so very clunkely on XP in particular). Given that the CRT changes with each compiler on windows it may change results.Ramirez
Just for laughs, try adding SetProcessWorkingSetSize(-1, 50*4096, 1024*1024*1024); as the first thing in main, see if that magically increases performance. I'm rather sure it will. Writing needs to lock memory under Windows, and you can only lock at most your maximum working set size. By default, that's 1.4MB. Which of course sucks ass when you are doing thousands of writes 1MB in size each.Biscay
Related reading, relevant for files containing large swathes of zero: en.wikipedia.org/wiki/Sparse_fileLinchpin
K
3

You need to sync file contents to disk, otherwise you are just measuring the level of caching being performed by the operating system.

Call fsync before you close the file.

If you don't do this, the majority of execution time is most likely spent waiting for cache to be flushed so that new data can be stored in it, but certainly a portion of the data you write will not be written out to disk by the time you close the file. The difference in execution times, then, is probably due to linux caching more of the writes before it runs out of available cache space. By contrast, if you call fsync before closing the file, all the written data should be flushed to disk before your time measurement takes place.

I suspect if you add an fsync call, the execution time on the two systems won't differ by so much.

Kickapoo answered 21/10, 2015 at 12:3 Comment(0)
S
0

Your test is not very good way to measure performance as there's places where different optimizations in different OS'es and libraries can make a huge difference (the compiler itself don't have to make a big difference).

First we can consider the fwrite (or anything that operates on FILE*) is a library layer above the OS-layer. There can be different buffering strategies that make a difference. For example one smart way of implementing fwrite would be to flush the buffers and then send the data block straight to the OS instead of go through the buffer layer. This can result in a huge advantage at the next step

Second we have the OS/kernel that can handle the write differently. One smart optimization would be to copy pages by just aliasing them and then use copy-on-write if changed in one of the aliases. Linux already does (almost) this when allocating memory to the process (including the BSS section where the array is) - it just marks the page as being zeros and can keep a single such page for all those pages and then creating a new page whenever somebody changes in a zero page. Doing this trick again means that the kernel could just alias a such page in the disk buffer. This means that the kernel would not run low on disk cache when writing such blocks of zeroes since it will only take up 4KiB of actual memory (except for page tables). This strategy is also possible if there's actual data in the data block.

This means that the writes could complete very quickly without any data actually needs to be transferred to the disk (before fwrite completes), even without the data even have to be copied from one place to another in memory.

So you use different libraries and different OS'es and it's not surprising that they perform different task in different time.

Smirch answered 21/10, 2015 at 11:43 Comment(0)
C
0

There are special optimizations for pages which are all zeros. You should fill the page with random data before writing it out.

Carnahan answered 21/10, 2015 at 12:34 Comment(3)
OK, so I tried filling the array with random data. (This step isn't part of the timing.) It still took almost exactly the same time to write the file [on Windows; I didn't check Linux].Intinction
I'm pretty sure that just writing 0s isn't enough to create a sparse area in a file. You need to either use OS-specific calls/ioctls or use ftruncate or seek past the end of the file.Kickapoo
@MathematicalOrchid: the problem is with the Linux measurement, that's the one you need to redo. (Or at least that's the theory.)Overbearing

© 2022 - 2024 — McMap. All rights reserved.