How to copy one file to many locations simultaneously
Asked Answered
G

1

5

I want to find a way to copy one file to multiple locations simultaneously (with C#).

means that i don't want the original file to be read only one time, and to "paste" the file to another locations (on local network).

as far as my tests showed me, the

File.Copy() 

will always read the source again.

and as far as i understand, even while using memory, that memory piece gets locked.

so basically, i want to mimic the "copy-paste" to the form of one "copy", and multiple "paste", without re-reading from the Hard Drive again.

Why ? because eventually, i need to copy one folder (more than 1GB) to many computers, and the bottleneck is the part the i need to read the source file.

So, Is it even possible to achieve ?

Gulp answered 19/6, 2012 at 19:57 Comment(6)
java2s.com/Code/CSharp/File-Stream/… ?Lambkin
Are you having a problem with that, just being scientific or just prematurely optimizing your code?Kemper
@ivowiblo: i'm trying to optimize a process that takes 2 hours (15 minutes for 10 computer). I'm sure that there's a better way from the "normal" copy.Gulp
@Holystream: that's the direction, but i still think that there's must be a better way to do the copy process using Threads (or Tasks...)Gulp
How do you test whether a method reads the source file once or many times? Thanks.Hydrogeology
@Jimbo I've used ProcessExplorer :-)Gulp
R
10

Rather than using the File.Copy utility method, you could open the source file as a FileStream, then open as many FileStreams to however many destination files you need, read from the source, and write to each destination stream.

UPDATE Changed it to write files using Parallel.ForEach to improve throughput.

public static class FileUtil
{
    public static void CopyMultiple(string sourceFilePath, params string[] destinationPaths)
    {
        if (string.IsNullOrEmpty(sourceFilePath)) throw new ArgumentException("A source file must be specified.", "sourceFilePath");

        if (destinationPaths == null || destinationPaths.Length == 0) throw new ArgumentException("At least one destination file must be specified.", "destinationPaths");

        Parallel.ForEach(destinationPaths, new ParallelOptions(),
                         destinationPath =>
                             {
                                 using (var source = new FileStream(sourceFilePath, FileMode.Open, FileAccess.Read, FileShare.Read))
                                 using (var destination = new FileStream(destinationPath, FileMode.Create))
                                 {
                                     var buffer = new byte[1024];
                                     int read;

                                     while ((read = source.Read(buffer, 0, buffer.Length)) > 0)
                                     {
                                         destination.Write(buffer, 0, read);
                                     }
                                 }

                             });
    }
}

Usage:

FileUtil.CopyMultiple(@"C:\sourceFile1.txt", @"C:\destination1\sourcefile1.txt", @"C:\destination2\sourcefile1.txt");
Raspy answered 19/6, 2012 at 20:1 Comment(15)
Brilliant !. I'll give it a try, and will probably mark as answered.Gulp
I tried it today, but it is still slow (1480 files, total 10MB, for two computers - took me about 4 minutes). please notice that your answer DID managed to read only once, but still, it didn't spare the time. I will profile it tomorrow, and will back with better reply. anyway, I'm considering Threads to do a parallel copy...Gulp
Indeed, it reads the file only once, but writes the stream one file at a time. Perhaps the Parallel programming API can help with this. I'll see if I can rework my answer to use that.Raspy
I've marked as answer, but i really want to know if there's a way to make it parallel, or at least compress the data (to reduce traffic).Gulp
Thanks. I promise to return with a parallel version; I've not had a chance to use Parallel yet so I'm interested myself.Raspy
Hmm... this is turning out to be a challenge. Using parallel yields slightly slower results than the original method. I suspect I'm doing something wrong... one issue, I think, is that I'm reading a chunk of data from the source file, then writing that chunk to each output file, waiting for each file to finish that chunk. I think if instead each output file had its own stream to the source (rather than sharing), it'll work better. i'll try again later.Raspy
OK that was a lot easier than I was making it out to be. See my updated answer. One thing to note is there is a stream opened from the source for each destination, however: i read it in and write it in chunks, so that will keep the overhead minimal (only 1k of data in memory, max, per file). Furthermore, Parallel will only have so many threads (thus, streams) going at once, and you can shape it further by specifying the MaxDegreeOfParallelism in the ParallelOptions argument. Pretty cool. Anyways, this change cut the time to write 10,000 files by a two thirds.Raspy
As for compressing, if you want to go that route you will need to have software on either end, so the software on the remote end can decompress the file afterward (if you try to decompress remotely, you'll just be streaming the decompressed data over the wire again, so that's pointless).Raspy
to make things clearer, I need to explain some more details: I have 20 computers. First one is \\CompDataSource, the second is the the "\\Deployer" which copy the files to the rest of computers (named \\Comp01 -> \\Comp18). Facts: 1. Every once in a while I need to copy about 25,000 files (2.5GB) to all of the computers. 2. simple copy paste (in \\Comp01) from \\CompDataSource to local computer takes about 25-30 minutes 3. simple copy paste (in \\Deployer) from \\CompDataSource to \\CompXX, takes about 35-45 minutes.Gulp
These was the results while using buffer of 1024: 1. copying (in \\Deployer) from \\CompSource to 1 computer, takes about 1:02:00 hours. 2. copying (in \\Deployer) from \\CompSource to 2 computers, takes about 1:07:00 hours. 2. copying (in \\Deployer) from \\CompSource to 4 computers, takes about 0:58:00 hours. These was the results while using buffer of source.Length: 1. copying (in \\Deployer) from \\CompSource to 2 computers, takes about 0:44:00 hours. 2. copying (in \\Deployer) from \\CompSource to 4 computers, takes about 0:43:00 hours.Gulp
changes I've done in your code: 1. added Close() and Dispose() for both source and destination. 2. changed buffer to sourceFile.Length, since i don't really care how much memory it will take, and i wanted to gain speed. to make things short: your solution is wonderful, thank you !.Gulp
Interesting results. I wonder what improvements can be made to FileStream to make it more efficient compared to copying using the shell. BTW, I wrapped the FileStreams in using statements, which guarantees that they are Flushed, Closed and Disposed :)Raspy
I don't understand how this code reads the source file only once. Can someone explain it? Thanks.Hydrogeology
@Jimbo My original answer did, but based on feedback (read the rest of the comments) from OP I dropped the single source read in exchange for parallelism. If you look at the edits, you can see the original single source read answer.Raspy
If I were reanswering this question today, I'd probably use Reactive Extensions to subscribe to the stream for however many copies you need and write asynchronously.Raspy

© 2022 - 2024 — McMap. All rights reserved.