Get all files and directories in specific path fast
Asked Answered
K

7

67

I am creating a backup application where c# scans a directory. Before I use to have something like this in order to get all the files and subfiles in a directory:

DirectoryInfo di = new DirectoryInfo("A:\\");
var directories= di.GetFiles("*", SearchOption.AllDirectories);

foreach (FileInfo d in directories)
{
       //Add files to a list so that later they can be compared to see if each file
       // needs to be copid or not
}

The only problem with that is that sometimes a file could not be accessed and I get several errors. an example of an error that I get is:error

As a result I created a recursive method that will scan all files in the current directory. If there where directories in that directory then the method will be called again passing that directory. The nice thing about this method is that I could place the files inside a try catch block giving me the option to add those files to a List if there where no errors and adding the directory to another list if I had errors.

try
{
    files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);               
}
catch
{
     //info of this folder was not able to get
     lstFilesErrors.Add(sDir(di));
     return;
}

So this method works great the only problem is that when I scan a large directory it takes to much times. How could I speed up this process? My actual method is this in case you need it.

private void startScan(DirectoryInfo di)
{
    //lstFilesErrors is a list of MyFile objects
    // I created that class because I wanted to store more specific information
    // about a file such as its comparePath name and other properties that I need 
    // in order to compare it with another list

    // lstFiles is a list of MyFile objects that store all the files
    // that are contained in path that I want to scan

    FileInfo[] files = null;
    DirectoryInfo[] directories = null;
    string searchPattern = "*.*";

    try
    {
        files = di.GetFiles(searchPattern, SearchOption.TopDirectoryOnly);               
    }
    catch
    {
        //info of this folder was not able to get
        lstFilesErrors.Add(sDir(di));
        return;
    }

    // if there are files in the directory then add those files to the list
    if (files != null)
    {
        foreach (FileInfo f in files)
        {
            lstFiles.Add(sFile(f));
        }
    }


    try
    {
        directories = di.GetDirectories(searchPattern, SearchOption.TopDirectoryOnly);
    }
    catch
    {
        lstFilesErrors.Add(sDir(di));
        return;
    }

    // if that directory has more directories then add them to the list then 
    // execute this function
    if (directories != null)
        foreach (DirectoryInfo d in directories)
        {
            FileInfo[] subFiles = null;
            DirectoryInfo[] subDir = null;

            bool isThereAnError = false;

            try
            {
                subFiles = d.GetFiles();
                subDir = d.GetDirectories();

            }
            catch
            {
                isThereAnError = true;                                                
            }

            if (isThereAnError)
                lstFilesErrors.Add(sDir(d));
            else
            {
                lstFiles.Add(sDir(d));
                startScan(d);
            }


        }

}

Ant the problem if I try to handle the exception with something like:

DirectoryInfo di = new DirectoryInfo("A:\\");
FileInfo[] directories = null;
            try
            {
                directories = di.GetFiles("*", SearchOption.AllDirectories);

            }
            catch (UnauthorizedAccessException e)
            {
                Console.WriteLine("There was an error with UnauthorizedAccessException");
            }
            catch
            {
                Console.WriteLine("There was antother error");
            }

Is that if an exception occurs then I get no files.

Kielce answered 19/5, 2011 at 16:41 Comment(3)
Rather than catching everything, you should just catch the specific exceptions (e.g. UnauthorisedAccessException), otherwise programming (e.g. NullReferenceException) and system errors (e.g. OutOfMemoryException) will get masked as application errors.Icarian
The time this takes is going to depend on the number of files in the hierarchy. If there are many files, it's going to take a long time. That's just the way it is.Henrieta
By the way, I showed a much simpler recursive directory listing method here: informit.com/guides/content.aspx?g=dotnet&seqNum=159. You can modify that code to handle the exceptions and to store things in your lists.Henrieta
K
47

This method is much faster. You can only tel when placing a lot of files in a directory. My A:\ external hard drive contains almost 1 terabit so it makes a big difference when dealing with a lot of files.

static void Main(string[] args)
{
    DirectoryInfo di = new DirectoryInfo("A:\\");
    FullDirList(di, "*");
    Console.WriteLine("Done");
    Console.Read();
}

static List<FileInfo> files = new List<FileInfo>();  // List that will hold the files and subfiles in path
static List<DirectoryInfo> folders = new List<DirectoryInfo>(); // List that hold direcotries that cannot be accessed
static void FullDirList(DirectoryInfo dir, string searchPattern)
{
    // Console.WriteLine("Directory {0}", dir.FullName);
    // list the files
    try
    {
        foreach (FileInfo f in dir.GetFiles(searchPattern))
        {
            //Console.WriteLine("File {0}", f.FullName);
            files.Add(f);                    
        }
    }
    catch
    {
        Console.WriteLine("Directory {0}  \n could not be accessed!!!!", dir.FullName);                
        return;  // We alredy got an error trying to access dir so dont try to access it again
    }

    // process each directory
    // If I have been able to see the files in the directory I should also be able 
    // to look at its directories so I dont think I should place this in a try catch block
    foreach (DirectoryInfo d in dir.GetDirectories())
    {
        folders.Add(d);
        FullDirList(d, searchPattern);                    
    }

}

By the way I got this thanks to your comment Jim Mischel

Kielce answered 19/5, 2011 at 17:54 Comment(2)
Thanks. This methods is 100x faster than Directory.GetFileSystemEntriesMintz
This code takes over a minute to get all files on my C: drive. In comparison this code new DirectoryInfo(searchPath).GetFiles(searchPattern, searchOptions) only takes 20 seconds. searchOptions add RecurseSubdirectories and hidden files.Backflow
J
19

In .NET 4.0 there's the Directory.EnumerateFiles method which returns an IEnumerable<string> and is not loading all the files in memory. It's only once you start iterating over the returned collection that files will be returned and exceptions could be handled.

Joon answered 19/5, 2011 at 17:6 Comment(1)
That is great but if I get an exception the query stops unless I am doing something wrongKielce
C
14

There is a long history of the .NET file enumeration methods being slow. The issue is there is not an instantaneous way of enumerating large directory structures. Even the accepted answer here has its issues with GC allocations.

The best I've been able to do is wrapped up in my library and exposed as the FindFile (source) class in the CSharpTest.Net.IO namespace. This class can enumerate files and folders without unneeded GC allocations and string marshalling.

The usage is simple enough, and the RaiseOnAccessDenied property will skip the directories and files the user does not have access to:

    private static long SizeOf(string directory)
    {
        var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
        fcounter.RaiseOnAccessDenied = false;

        long size = 0, total = 0;
        fcounter.FileFound +=
            (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    Interlocked.Increment(ref total);
                    size += e.Length;
                }
            };

        Stopwatch sw = Stopwatch.StartNew();
        fcounter.Find();
        Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
                          total, size, sw.Elapsed.TotalSeconds);
        return size;
    }

For my local C:\ drive this outputs the following:

Enumerated 810,046 files totaling 307,707,792,662 bytes in 232.876 seconds.

Your mileage may vary by drive speed, but this is the fastest method I've found of enumerating files in managed code. The event parameter is a mutating class of type FindFile.FileFoundEventArgs so be sure you do not keep a reference to it as it's values will change for each event raised.

Comte answered 17/9, 2012 at 17:57 Comment(7)
+1 because it is impressive how much faster it is compared to the other techniques.THE ONLY PROBLEM IS THAT IT FINDS LESS FILES THAN THE OTHER ALGORITHMS WHEN USING IT AGAINS THE C DRIVEKielce
You are also missing to call the FIND() method. I placed fcounter.Find() method after the lambda and it worked great.Kielce
Oh, blah... LOL yes the sample is bugged ;) thanks for pointing it outComte
i need to find specific files with multiple search patterns. how should I add them in place of "" @csharptest.net... i tried ".txt;*.exe",".txt|.exe".Triggerfish
@gotoVoid just enumerate . and filter the extensions yourself. It's actually faster that way.Comte
This is incredibly fast for listing large directories!Henley
If anyone's interested I wrapped this up in a nice little api with mime type filtering and a few unit tests: bitbucket.org/snippets/qes/ob9zHenley
E
3

(copied this piece from my other answer in your other question)

Show progress when searching all files in a directory

Fast files enumeration

Of course, as you already know, there are a lot of ways of doing the enumeration itself... but none will be instantaneous. You could try using the USN Journal of the file system to do the scan. Take a look at this project in CodePlex: MFT Scanner in VB.NET... it found all the files in my IDE SATA (not SSD) drive in less than 15 seconds, and found 311000 files.

You will have to filter the files by path, so that only the files inside the path you are looking are returned. But that is the easy part of the job!

Epithalamium answered 17/9, 2012 at 17:10 Comment(2)
This seems to require administrator elevation, else passing new DriveInfo("c") causes an ACCESS_DENIED exception. It is also restricted to NTFS partitions only which have Journal enabled. Good solution otherwise as it definitely is A LOT faster than using the normal API's. Not sure why the core framework doesn't make use of this or why the file system doesn't offer a read-only version accessible by any access level.Fulminous
@SamuelJackson When using the Journal to enumerate MFT entries, all changes will be listed, I mean, even those made by other users, admins or the system itself. Everything! That is why the access level that is needed is of a Backup Operator... that allows to read anything from the file system, but not execute, nor write files that is not his/her own.Epithalamium
C
3

I know this is old, but... Another option may be to use the FileSystemWatcher like so:

void SomeMethod()
{
    System.IO.FileSystemWatcher m_Watcher = new System.IO.FileSystemWatcher();
    m_Watcher.Path = path;
    m_Watcher.Filter = "*.*";
    m_Watcher.NotifyFilter = m_Watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite | NotifyFilters.FileName | NotifyFilters.DirectoryName;
    m_Watcher.Created += new FileSystemEventHandler(OnChanged);
    m_Watcher.EnableRaisingEvents = true;
}

private void OnChanged(object sender, FileSystemEventArgs e)
    {
        string path = e.FullPath;

        lock (listLock)
        {
            pathsToUpload.Add(path);
        }
    }

This would allow you to watch the directories for file changes with an extremely lightweight process, that you could then use to store the names of the files that changed so that you could back them up at the appropriate time.

Centerboard answered 20/10, 2012 at 0:40 Comment(0)
M
3

Maybe it will be helpfull for you. You could use "DirectoryInfo.EnumerateFiles" method and handle UnauthorizedAccessException as you need.

using System;
using System.IO;

class Program
{
    static void Main(string[] args)
    {
        DirectoryInfo diTop = new DirectoryInfo(@"d:\");
        try
        {
            foreach (var fi in diTop.EnumerateFiles())
            {
                try
                {
                    // Display each file over 10 MB; 
                    if (fi.Length > 10000000)
                    {
                        Console.WriteLine("{0}\t\t{1}", fi.FullName, fi.Length.ToString("N0"));
                    }
                }
                catch (UnauthorizedAccessException UnAuthTop)
                {
                    Console.WriteLine("{0}", UnAuthTop.Message);
                }
            }

            foreach (var di in diTop.EnumerateDirectories("*"))
            {
                try
                {
                    foreach (var fi in di.EnumerateFiles("*", SearchOption.AllDirectories))
                    {
                        try
                        {
                            // Display each file over 10 MB; 
                            if (fi.Length > 10000000)
                            {
                                Console.WriteLine("{0}\t\t{1}",  fi.FullName, fi.Length.ToString("N0"));
                            }
                        }
                        catch (UnauthorizedAccessException UnAuthFile)
                        {
                            Console.WriteLine("UnAuthFile: {0}", UnAuthFile.Message);
                        }
                    }
                }
                catch (UnauthorizedAccessException UnAuthSubDir)
                {
                    Console.WriteLine("UnAuthSubDir: {0}", UnAuthSubDir.Message);
                }
            }
        }
        catch (DirectoryNotFoundException DirNotFound)
        {
            Console.WriteLine("{0}", DirNotFound.Message);
        }
        catch (UnauthorizedAccessException UnAuthDir)
        {
            Console.WriteLine("UnAuthDir: {0}", UnAuthDir.Message);
        }
        catch (PathTooLongException LongPath)
        {
            Console.WriteLine("{0}", LongPath.Message);
        }
    }
}
Mousy answered 3/11, 2014 at 13:18 Comment(0)
S
2

You can use this to get all directories and sub-directories. Then simply loop through to process the files.

string[] folders = System.IO.Directory.GetDirectories(@"C:\My Sample Path\","*", System.IO.SearchOption.AllDirectories);

foreach(string f in folders)
{
   //call some function to get all files in folder
}
Smorgasbord answered 14/5, 2012 at 13:29 Comment(1)
It seems you did not understand the questionOctans

© 2022 - 2024 — McMap. All rights reserved.