Why .NET async await file copy is a lot more CPU consuming than synchronous File.Copy() call?
Asked Answered
B

3

8

Why the code below results in:

.NET async file copy

public static class Program
{
    public static void Main(params string[] args)
    {
        var sourceFileName = @"C:\Users\ehoua\Desktop\Stuff\800MFile.exe";
        var destinationFileName = sourceFileName + ".bak";

        FileCopyAsync(sourceFileName, destinationFileName);

        // The line below is actually faster and a lot less CPU-consuming
        // File.Copy(sourceFileName, destinationFileName, true);

        Console.ReadKey();
    }

    public static async void FileCopyAsync(string sourceFileName, string destinationFileName, int bufferSize = 0x1000, CancellationToken cancellationToken = default(CancellationToken))
    {
        using (var sourceFile = File.OpenRead(sourceFileName))
        {
            using (var destinationFile = File.OpenWrite(destinationFileName))
            {
                Console.WriteLine($"Copying {sourceFileName} to {destinationFileName}...");
                await sourceFile.CopyToAsync(destinationFile, bufferSize, cancellationToken);
                Console.WriteLine("Done");
            }
        }
    }
}

While File.Copy(): https://msdn.microsoft.com/en-us/library/system.io.file.copy(v=vs.110).aspx is a lot less cpu-consuming:

Sync File Copy

So is there still a real interest using async / await for file copy purposes?

I thought saving a thread for copying might worth it but the File.Copy windows function seems to win the battle hands down in terms of CPU %. Some would argue it's because of the real DMA support but still, am I doing anything to ruin the performances? Or is there anything that can be done to improve the CPU usage with my async method?

Bibliotaph answered 22/2, 2017 at 22:51 Comment(16)
is it faster though?Archaic
@Archaic not a lot of faster, a bit, I'm still far more concerned by the CPU %, that looks scary.Bibliotaph
It probably has to do with the difference between some under-the-hood way that File.Copy is implemented vs. your direct stream-based approach of using File.OpenRead and File.OpenWrite. I doubt it has anything to do with sync vs. async.Lucania
Well I'm suspecting so, but it's not really clear why an async operation would use that much CPU just for copying a file, I managed to reduce a bit though by using a much bigger buffer but it's just delay when the CPU peaks are going to show up.Bibliotaph
Keep in mind, too, that the way that async/await works in a console application is radically different than how it works in a UI application due to the lack of a synchronization context.Toast
Like I said, I doubt that it has anything to do with it being async, but instead has to do with how it is performing the copy operation. Your async method copies from one stream to another, while your sync method uses File.Copy which wrappers native Win32 operations for a more low-level approach. See #1247399Lucania
@Lucania I'm gonna these outBibliotaph
Your question is: I asked the compiler to generate code that could utilize the CPU while I was waiting for an IO operation; why did I get higher CPU utilization when I did that? The question answers itself when you phrase it that way, no? What do you think async is for? It's to increase the amount of CPU you use while you're waiting for IO. Remember, high CPU utilization is good. People talk like it is bad, but high CPU is awesome. The machine owner paid for that CPU; every millisecond it idles is a waste of a resource.Exculpate
@EricLippert put it this way, yes, I guess I was not expecting async to use that much cpu just for waiting IO completing. Would like to mark your comment as an answer. It makes a lot more sense now. Thanks :)Bibliotaph
Well, maybe something else is going on here; as others have said, probably you're getting some transient effect from a virus checker or something. Particularly since your program doesn't seem to be doing anything while its waiting. But in general you should expect to see better -- higher -- CPU utilization when using async. Don't stall the CPU; keep it working to solve CPU-bound problems.Exculpate
TAP pattern does not really give you any advantage here as you don't have any other code to execute whileyou are waiting for your file to copy. You would be better off just assigning your file copy job to another thread without await, I think?Silence
@EricLippert can't agree more with you, I was interested in performances about copying so I wanted to have a first look at different strategies: async await + stream / pinvoke FileCopy and / File.Copy() without the whole system running around (which turns out to use some CPU). Surely it biased a bit my outcome analysis to just make running a dummy copy in a lonely console app.Bibliotaph
@Silence well you're right if I stick to the basic scenario which was oversimplified and focused on the cpu %. Quite frankly I start thinking about that too, aka just running File.Copy or FileCopyEx pInvoke (for progress report) in a Task and done, seems faster (after a few benchmarks) and simple, so legit to me.Bibliotaph
Somewhat related: Why File.ReadAllLinesAsync() blocks the UI thread? Most built-in asynchronous filesystem APIs are currently broken. Not only they are much slower, but they are also not really asynchronous. My suggestion is: don't use them.Vichyssoise
@EricLippert high CPU usage is only awesome when the CPU is doing something useful. Otherwise we would do string concatenation instead of using StringBuilders. In this particular case the only (hypothetical) benefit of using the Async version and paying the performance cost, is that it prevents a thread from being blocked. A thread costs 1 MB RAM, so obviously sacrificing 50% of the user's CPU to save 0.01% of their RAM is a bad tradeoff. To add insult to injury, this benefit is not even real: the "async" version still blocks a thread.Vichyssoise
@TheodorZoulias: Yes, as I mentioned in my comment what we want is the CPU working efficiently to do CPU-bound work. While we're being picky I note that a .NET thread on Windows costs 1MB of committed virtual address space; the OS is smart enough to not map that to RAM until necessary. Better to think of it as 1MB of swap file, not RAM, most of the time.Exculpate
S
6

File.OpenRead(sourceFileName) is equivalent to new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read) which is in turn equivalent to public FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, false) which is to say with false for async I/O. The equivalent is true of the File.OpenWrite.

As such any XXXAsync operations won't use async I/O but will fake it using thread-pool threads.

So it gets none of the benefit of async I/O and wastes at least one thread. You've got an extra thread blocking on I/O which was what you wanted to avoid. I'd generally expect async on its own to perform slightly slower than sync (async generally sacrifices one-off speed for better scalability) but I'd definitely expect this to do little better, if at all, than wrapping the whole thing in Task.Run().

I'd still not expect it to be quite as bad, but maybe anti-malware is being worried by writing to an .exe.

You would hopefully fare better copying a non-exe and with asynchronous streams.

Surgery answered 23/2, 2017 at 0:2 Comment(2)
Actually I mentioned it when commenting the answer of Hans, the isAsync parameter once set to true reduced dramatically the CPU %, but the speed seems a lot slower (it took me a while cause I was not benchmarking at first the file copy speed but seems File.Copy is apparently 2 - 3 times faster than async await but again take it with a pinch of salt). I will update the post tomorrow with a proper benchmarking. I accept your answer since it makes a lot of sense in addition to the comments of Eric Lippert.Bibliotaph
I wouldn't be horrified by async copying being a couple of times slower than sync. I wouldn't use it when I wanted higher single-operation throughput; I'd use it either when I wanted to do something else in the meantime, or on a web application when I don't care as much about how fast I'm serving a single request as I do about much fast I'm serving many and how many I can serve at the same time.Surgery
O
9

These are pretty absurd perf numbers. You are simply not measuring what you think you are. This should not take more than a minor blip, a simple memory-to-memory copy for cached file data. Like File.Copy() did. Operates at ~35 gigabytes/second on a machine with decent DDR3 RAM so can't take more than a few dozen milliseconds. Even if the file is not cached or the machine doesn't have enough RAM then you still can't get this kind of CPU load, your code would be blocked waiting for the disk.

What you are actually seeing is the perf of your installed anti-malware product. It always gets its underwear in a bundle when it sees programs manipulating executable files.

Simple to verify, disable it or make an exclusion and try again.

Overskirt answered 22/2, 2017 at 23:42 Comment(3)
Agreed about the VS "malware", however, I'd like to point out that the task manager still indicates that there is a certain increase in terms of cpu % (plus the cheap fan of my cheap laptop is suffering a bit more when using async await). Again yup the implementation of the File.Copy is certainly better still I manager to lower the async implementation cpu % by about 75 % by using isAsync param. I'm gonna check out the code corefx on GitHub, cause this is still a bit too blurry about what's going on behind the scenes. Anyway thanks for shedding the light about the testing environment.Bibliotaph
"What you are actually seeing is the perf of your installed anti-malware product." -- Can you clarify what you mean here? Are you saying the anti-malware product is using a bunch of CPU which is shown in Visual Studio? Or are you saying the other program is causing @EhouarnPerret's program to use more CPU itself? I just opened Prime95 which pegs all 8 of my cores at 100% in Task Manager but Visual Studio still shows ~13% cpu usage, the same number Task Manager shows for my 1 "ConsoleApplication" process.Berberidaceous
You just don't normally see the overhead induced by anti-malware because you don't have the luxury of a profiler observing what a program is doing. Don't shoot the messenger :) Click the Ask Question button if you have another case where you don't trust the profiler.Overskirt
S
6

File.OpenRead(sourceFileName) is equivalent to new FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read) which is in turn equivalent to public FileStream(sourceFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, false) which is to say with false for async I/O. The equivalent is true of the File.OpenWrite.

As such any XXXAsync operations won't use async I/O but will fake it using thread-pool threads.

So it gets none of the benefit of async I/O and wastes at least one thread. You've got an extra thread blocking on I/O which was what you wanted to avoid. I'd generally expect async on its own to perform slightly slower than sync (async generally sacrifices one-off speed for better scalability) but I'd definitely expect this to do little better, if at all, than wrapping the whole thing in Task.Run().

I'd still not expect it to be quite as bad, but maybe anti-malware is being worried by writing to an .exe.

You would hopefully fare better copying a non-exe and with asynchronous streams.

Surgery answered 23/2, 2017 at 0:2 Comment(2)
Actually I mentioned it when commenting the answer of Hans, the isAsync parameter once set to true reduced dramatically the CPU %, but the speed seems a lot slower (it took me a while cause I was not benchmarking at first the file copy speed but seems File.Copy is apparently 2 - 3 times faster than async await but again take it with a pinch of salt). I will update the post tomorrow with a proper benchmarking. I accept your answer since it makes a lot of sense in addition to the comments of Eric Lippert.Bibliotaph
I wouldn't be horrified by async copying being a couple of times slower than sync. I wouldn't use it when I wanted higher single-operation throughput; I'd use it either when I wanted to do something else in the meantime, or on a web application when I don't care as much about how fast I'm serving a single request as I do about much fast I'm serving many and how many I can serve at the same time.Surgery
B
3

File.Copy appears to copy the entire file in one go. Using FileStreams the default buffer size is 4096 bytes so it will copy 4kb at a time.

I wrote my own async function which does more than just copy the file (it matches file sizes and does cleanup) but here are the results from bench-marking file copy over a VPN across a 50mbps broadband link.

When using the default 4096 bytes my async File Copy:

Copy of 52 files via CopyFileAsync() took 5.6 minutes

vs.

File.Copy which takes

Copy of 52 files via File.Copy() took 24 secs, 367 ms

When I increase the buffer size to 64KB I get the following

Copy of 52 files via CopyFileAsync() took 39 secs, 407 ms

Bottom line is the default buffer size of 4096 is way too small for modern hardware and that is why it is so slow copying via streams. You need to do bench marking against the hardware you will be using to determine the optimum buffer size but generally speaking 64K is fairly optimum for network traffic across the internet.

Brindled answered 25/1, 2021 at 0:57 Comment(1)
Note that the optimal buffer size on your machine might not be optimal on someone else's machine. See also How do you determine the ideal buffer size when using FileInputStream? .Elianaelianora

© 2022 - 2024 — McMap. All rights reserved.