Show progress when searching all files in a directory
Asked Answered
I

2

6

I previously asked the question Get all files and directories in specific path fast in order to find files as fastest as possible. I am using that solution in order to find the file names that match a regular expression.

I was hoping to show a progress bar because with some really large and slow hard drives it still takes about 1 minute to execute. That solution I posted on the other link does not enable me to know how many more files are missing to be traversed in order for me to show a progress bar.

One solution that I was thinking about doing was trying to obtain the size of the directory that I was planing traversing. For example when I right click on the folder C:\Users I am able to get an estimate of how big that directory is. If I am able to know the size then I will be able to show the progress by adding the size of every file that I find. In other words the progress = (current sum of file sizes) / directory size

For some reason I have not been able to efficiently get the size of that directory.

Some of the questions on stack overflow use the following approach:

enter image description here

But note that I get an exception and are not able to enumerate the files. I am curios in trying that method on my c drive.

On that picture I was trying to count the number of files in order to show a progress. I will probably not going to be able to get the number of files efficiently using that approach. I where just trying some of the answers on stack overflow when people asked how to get the number of files on a directory and also people asked how the get the size f a directory.

Infestation answered 12/9, 2012 at 0:44 Comment(4)
Your program does not have sufficient rights to access that directory. Run the program with admin rights, or require them in the app manifest. Right click->Run As Administrator.Subduct
even if I run as an admin there are other files that I will not be able to access on the C drive.Infestation
For example try to access the folder c:\Documents and Settings as an administrator. I think the method Directory.EnumerateFiles is missing a way to deal with this problem.Infestation
connect.microsoft.com/VisualStudio/feedback/details/512171/…Retinitis
T
6

Solving this is going to leave you with one of a few possibilities...

  1. Not displaying a progress
  2. Using an up-front cost to compute (like Windows)
  3. Performing the operation while computing the cost

If the speed is that important and you expect large directory trees I would lean to the last of these options. I've added an answer on the linked question Get all files and directories in specific path fast that demonstrates a faster means of counting files and sizes than you are currently using. To combine this into a multi-threaded piece of code for option #3, the following can be performed...

static void Main()
{
    const string directory = @"C:\Program Files";
    // Create an enumeration of the files we will want to process that simply accumulates these values...
    long total = 0;
    var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    fcounter.RaiseOnAccessDenied = false;
    fcounter.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    Interlocked.Increment(ref total);
                }
            };

    // Start a high-priority thread to perform the accumulation
    Thread t = new Thread(fcounter.Find)
        {
            IsBackground = true, 
            Priority = ThreadPriority.AboveNormal, 
            Name = "file enum"
        };
    t.Start();

    // Allow the accumulator thread to get a head-start on us
    do { Thread.Sleep(100); }
    while (total < 100 && t.IsAlive);

    // Now we can process the files normally and update a percentage
    long count = 0, percentage = 0;
    var task = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
    task.RaiseOnAccessDenied = false;
    task.FileFound +=
        (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    ProcessFile(e.FullPath);
                    // Update the percentage complete...
                    long progress = ++count * 100 / Interlocked.Read(ref total);
                    if (progress > percentage && progress <= 100)
                    {
                        percentage = progress;
                        Console.WriteLine("{0}% complete.", percentage);
                    }
                }
            };

    task.Find();
}

The FindFile class implementation can be found at FindFile.cs.

Depending on how expensive your file-processing task is (the ProcessFile function above) you should see a very clean progression of the progress on large volumes of files. If your file-processing is extremely fast, you may want to increase the lag between the start of enumeration and start of processing.

The event argument is of type FindFile.FileFoundEventArgs and is a mutable class so be sure you don't keep a reference to the event argument as it's values will change.

Ideally you will want to add error handling and probably the ability to abort both enumerations. Aborting the enumeration can be done by setting "CancelEnumeration" on the event argument.

Tourmaline answered 17/9, 2012 at 18:28 Comment(8)
wow your solution is much much faster. why does the algorithm at #6062457 (the one I was using) returns many more files. in fact it returns 401,000 files and your algirthm returns 207,000 when doing it against my C:\ drive.Infestation
I forgot to interlock the counter and I counted 327180 files. still much fewer than the other algorithm. why?Infestation
Are you sure that the difference is not that the other routine is listing directories and files? If you are counting directories and files in both, then realize that the FileFind class does not report directories named '.' or '..' for obvious reasons. If you are curious write all the paths to two text files and compare. I ran a few tests and found an exact match on file counts.Tourmaline
Yeah I am not counting the directories with names .. and . . I only get fewer files when do it against the path `C:` . I wonder why. I will soon make a comparison between the to methos and see which files are not being included...Infestation
Let me know what you find... you can also email me using "roger@" my SO user name, "csharptest.net".Tourmaline
I was not running as an administrator! that's why I found fewer files. Your algorithm is 10% faster! Thanks!Infestation
actually: Also this algorithm https://mcmap.net/q/138549/-is-there-a-faster-way-to-scan-through-a-directory-recursively-in-net is as fast as the one you showed on my computer. Your algorithm works great for showing the progress. I still don't get how the progress works... I will have to look closer at the code.Infestation
The code does nothing for me, it goes right through to the end. I changed to task.RaiseOnAccessDenied = true; and there were no exceptions.Spiegleman
C
4

What you are asking may not be possible because of how the file-system store it's data.

It is a file system limitation

There is no way to know the total size of a folder, nor the total files count inside a folder without enumerating files one by one. Neither of these informations are stored in the file system.

This is why Windows shows a message like "Calculating space" before copying folders with a lot of files... it is actually counting how many files are there inside the folder, and summing their sizes so that it can show the progress bar while doing the real copy operation. (it also uses the informations to know if the destination has enough space to hold all the data being copied).

Also when you right-click a folder, and go to properties, note that it takes some time to count all files and to sum all the file sizes. That is caused by the same limitation.

To know how large a folder is, or how many files are there inside a folder, you must enumerate the files one-by-one.

Fast files enumeration

Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET (the code is actually in C#... don't know why the author says it is VB.NET) ... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.

You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!

Hope this helps in your project... good luck!

Conrado answered 17/9, 2012 at 16:58 Comment(2)
Also, there may well be permission issues with enumerating files (as mentioned above)Kymberlykymograph
Yes, there are permission issues, I am the administrator of my computer. To use this method, the user must have administrative privilleges, as says the documentations of the CreateFile API method, about opening a physical disk drive or a volume: msdn.microsoft.com/en-us/library/windows/desktop/…Conrado

© 2022 - 2024 — McMap. All rights reserved.