Computing estimated times of file copies / movements?

Asked 20/7, 2009 at 7:56 Answered 5/4, 2021 at 16:26

Solved language-agnostic file filesystems copy estimation

They could say "the connection is probably lost," but it's more fun to do naive time-averaging to give you hope that if you wait around for 1,163 hours, it will finally finish.

Inspired by this xckd cartoon I wondered exactly what is the best mechanism to provide an estimate to the user of a file copy / movement?

The alt tag on xkcd reads as follows:

They could say "the connection is probably lost," but it's more fun to do naive time-averaging to give you hope that if you wait around for 1,163 hours, it will finally finish.

Ignoring the funny, is that really how it's done in Windows? How about other OS? Is there a better way?

Stoichiometry answered 20/7, 2009 at 7:56 Comment(2)

Working out how it's done on Windows and then doing the exact opposite would be a good starting point. 8-) – Spillage 20/7, 2009 at 7:59

Nice comic. I hadn't seen this one. – Skite 20/7, 2009 at 8:25

Have a look at my answer to a similar question (and the other answers there) on how the remaining time is estimated in Windows Explorer.

In my opinion, there is only one way to get good estimates:

Calculate the exact number of bytes to be copied before you begin the copy process
Recalculate you estimate regularly (every 1, 5 or 10 seconds, YMMV) based on the current transfer speed
The current transfer speed can fluctuate heavily when you are copying on a network, so use an average, for example based on the amount of bytes transfered since your last estimate.

Note that the first point may require quite some work, if you are copying many files. That is probably why the guys from Microsoft decided to go without it. You need to decide yourself if the additional overhead created by that calculation is worth giving your user a better estimate.

Lavinalavine answered 20/7, 2009 at 8:38 Comment(0)

I've done something similar to estimate when a queue will be empty, given that items are being dequeued faster than they are being enqueued. I used linear regression over the most recent N readings of (time,queue size).

This gives better results than a naive

(bytes_copied_so_far / elapsed_time) * bytes_left_to_copy

Kleeman answered 20/7, 2009 at 8:15 Comment(0)

Start a global timer that fires say, every 1000 milliseconds and update a total elpased time counter. Let's call this variable "elapsedTime"
While the file is being copied, update some local variable with the amount already copied. Let's call this variable "totalCopied"
In the timer event that is periodically raised, divide totalCopied by totalElapsed to give the number of bytes copied per timer interval (in this case, 1000ms). Let's call this variable "bytesPerSec"
Divide the total file size by bytesPerSec and obtain the total number of seconds theoretically required to copy this file. Let's call this variable remainingTime
Subtract elapsedTime from remainingTime and you a somewhat accurate calculation for file copy time.

Lundgren answered 20/7, 2009 at 8:27 Comment(0)

I think dialogs should just admit their limitations. It's not annoying because it's failing to give a useful time estimate, it's annoying because it's authoritatively offering an estimate that's obvious nonsense.

So, estimate however you like, based on current rate or average rate so far, rolling averages discarding outliers, or whatever. Depends on the operation and the typical durations of events which delay it, so you might have different algorithms when you know the file copy involves a network drive. But until your estimate has been fairly consistent for a period of time equal to the lesser of 30 seconds or 10% of the estimated time, display "oh dear, there seems to be some kind of holdup" when it's massively slowed, or just ignore it if it's massively sped up.

For example, dialog messages taken at 1-second intervals when a connection briefly stalls:

remaining: 60 seconds               // estimate is 60 seconds
remaining: 59 seconds               // estimate is 59 seconds
remaining: delayed [was 59 seconds] // estimate is 12 hours
remaining: delayed [was 59 seconds] // estimate is infinity
remaining: delayed [was 59 seconds] // got data: estimate is 59 seconds
// six seconds later
remaining: 53 seconds               // estimate is 53 seconds

Indulgence answered 20/7, 2009 at 11:34 Comment(0)

Most of all I would never display seconds (only hours and minutes). I think it's really frustrating when you sit there and wait for a minute while the timer jumps between 10 and 20 seconds. And always display real information like: xxx/yyyy MB copied.

I would also include something like this:

if timeLeft > 5h --> Inform user that this might not work properly
if timeLeft > 10h --> Inform user that there might be better ways to move the file
if timeLeft > 24h --> Abort and check for problems

I would also inform the user if the estimated time varies too much

And if it's not too complicated, there should be an auto-check function that checks if the process is still alive and working properly every 1-10 minutes (depending on the application).

Desolate answered 20/7, 2009 at 8:13 Comment(1)

What happens then if you're copying say, 100GB of videos? it doesn't happen much but it can happen... – Penman 20/7, 2009 at 8:28

speaking about network file copy, the best thing is to calculate file size to be transfered, network response and etc. An approach that i used once was:

Connection speed = Ping and calculate the round trip time for packages with 15 Kbytes.

Get my file size and see, theorically, how many time it would take if i would break it in 15 kb packages using my connection speed.

Recalculate my connection speed after transfer is started and ajust the time that will be spended.

Distracted answered 20/7, 2009 at 8:14 Comment(0)

I've been pondering on this one myself. I have a copy routine - via a Windows Explorer style interface - which allows the transfer of selected files from an Android Device, to a PC.

At the start, I know the total size of the file(s) that are to be copied, and as I am using C#.NET, I am using a Stopwatch, to get the elapsed time, and while the copy is in progress, I am keeping a total of what is copied so far, in terms of bytes.

I haven't actually tested it yet, but the best way seems to be this -

estimated = elapsed * ((totalSize - copiedSoFar) / copiedSoFar)

Particularity answered 13/10, 2013 at 1:5 Comment(0)

I never saw it the way you guys are explaining it-by trasfeed bytes & total bytes.

The "experience" always made a lot more sense (not good/accurate) if you instead use bytes of each file, and file count. This is how the estimate swings wildly.

If you are transferring large files first, the estimate goes long-even with the connection static. It is like it naively thinks that all files are the average size of those thus transferred, and then makes a guess assuming that the average file size will remain accurate for the entire time.

This, and the other ways, all get worse when the connection 'speed' varies...

Keynesianism answered 5/4, 2021 at 16:26 Comment(0)

Recommended topics

Hot tags