C# File and Directory iteration, possible to do both at one?
Asked Answered
L

2

5

This might be a confusing question but I have written below a Directory crawler, that will start at a root crawler, find all unique directories and then find all files and count them and add up their file size. However, the way I have it written requires going to the directory twice, one to find the directories and the next time to count the files. If/how is it possible to get all the information once?

       Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        HashSet<string> DirectoryHolding = new HashSet<string>();
        DirectoryHolding.Add(rootDirectory);


        #region All Directory Region
        int DirectoryCount = 0;
        int DirectoryHop = 0;
        bool FindAllDirectoriesbool = true;
        while (FindAllDirectoriesbool == true)
        {
            string[] DirectoryHolder = Directory.GetDirectories(rootDirectory);
            if (DirectoryHolder.Length == 0)
            {
                if (DirectoryHop >= DirectoryHolding.Count())
                {
                    FindAllDirectoriesbool = false;
                }
                else
                {
                    rootDirectory = DirectoryHolding.ElementAt(DirectoryHop);
                }
                DirectoryHop++;
            }
            else
            {
                foreach (string DH in DirectoryHolder)
                {
                    DirectoryHolding.Add(DH);
                }
                if (DirectoryHop > DirectoryHolding.Count())
                {
                    FindAllDirectoriesbool = false;
                }
                rootDirectory = DirectoryHolding.ElementAt(DirectoryHop);
                DirectoryHop++;

            }

        }
        DirectoryCount = DirectoryHop - 2;
        #endregion



        #region File Count and Size Region
        int FileCount = 0;
        long FileSize = 0;
        for (int i = 0; i < DirectoryHolding.Count ; i++)
        {
            string[] DirectoryInfo = Directory.GetFiles(DirectoryHolding.ElementAt(i));
            for (int fi = 0; fi < DirectoryInfo.Length; fi++)
            {
                try
                {
                    FileInfo fInfo = new FileInfo(DirectoryInfo[fi]);
                    FileCount++;
                    FileSize = FileSize + fInfo.Length;
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message.ToString());
                }
            }
        }

The stopwatch result for this is 1.38

int FileCount = 0;
        long FileSize = 0;
        for (int i = 0; i < DirectoryHolding.Count; i++)
        {
            var entries = new DirectoryInfo(DirectoryHolding.ElementAt(i)).EnumerateFileSystemInfos();
            foreach (var entry in entries)
            {
                if ((entry.Attributes & FileAttributes.Directory) == FileAttributes.Directory)
                {
                    DirectoryHolding.Add(entry.FullName);
                }
                else
                {
                    FileCount++;
                    FileSize = FileSize + new FileInfo(entry.FullName).Length;
                }
            }
        }

the stop watch for this method is 2.01,

this makes no sense to me.

 DirectoryInfo Dinfo = new DirectoryInfo(rootDirectory);
            DirectoryInfo[] directories = Dinfo.GetDirectories("*.*", SearchOption.AllDirectories);
            FileInfo[] finfo = Dinfo.GetFiles("*.*", SearchOption.AllDirectories);
            foreach (FileInfo f in finfo)
            {
                FileSize = FileSize + f.Length;
            }
            FileCount = finfo.Length;
            DirectoryCount = directories.Length; 

.26 seconds i think this is the winner

Logographic answered 28/3, 2012 at 18:14 Comment(4)
Tip: You don't need to do this: if (FindAllDirectoriesbool == true) this will work correctly: if (FindAllDirectoriesbool)Stogy
Use Directory.EnumerateFiles or recursion.Grandiose
@Grandiose how would that speed this up, it's certainly not a foreach loop but isn't it the same speed? I think network connectivity here is the bottleneck not so much the codeLogographic
@Grandiose I still have to do GetDirectories to know what directories are nextLogographic
B
8

You can use Directory.EnumerateFileSystemEntries():

var entries = Directory.EnumerateFileSystemEntries(rootDirectory);
foreach (var entry in entries)
{
    if(File.Exists(entry))
    {
        //file
    }
    else
    {
        //directory
    }
}

Or alternatively DirectoryInfo.EnumerateFileSystemInfos() (this might be more performant since FileSystemInfo already has most of the info you need and you can skip the File.Exists check):

var entries = new DirectoryInfo(rootDirectory).EnumerateFileSystemInfos();
foreach (var entry in entries)
{
    if ((entry.Attributes & FileAttributes.Directory) == FileAttributes.Directory)
    {
        //direcotry
    }
    else
    {
        //file
    }
}
Bowshot answered 28/3, 2012 at 18:22 Comment(7)
I can certainly modify the code here, but how does change the issue of still finding and sitting all the directories and then going back to those directories? Not trying to be rude i just dont understand, I have to find all "entries" before i can foreach themLogographic
believe it or not, that method is 30% slower than the both I have up top, posted method:stopwatch.Elapsed.TotalMinutes= 1.35 method mentioned above: = stopwatch.Elapsed.TotalMinutes 1.88Logographic
How do you get the file size in the else statement?Logographic
I tested the second method, if i have to use FileInfo it takes about 1.7 is i dont have to use fileInfo it takes 10 seconds, so If i can get filesize without calling fileinfo again that would be awesomeLogographic
You'd have to manually acquire it - more accurate information, but probably slower than going through DirectoryInfo.GetFiles (can't beat cached info): long fileSize = new FileInfo(entry.Name).Length;Bowshot
@Mike: Looking at your original code you do manually query for FileInfo right now - instead use DirectoryInfo.GetFiles or DirectoryInfo.EnumerateFiles which uses cached information so should be much fasterBowshot
the last comment you had gave me the idea so you get the answer ty!Logographic
M
2

The usual approach is to write a recursive method. Here it is in pseudocode:

void ProcessDirectory(Dir directory)
{
    foreach (var file in directory.Files)
        ProcessFile(file);

    foreach (var child in directory.Subdirectories)
        ProcessDirectory(directory);
}

You can also reverse the order of the foreach loops. For example, to calculate the total size of all files with a recursive method, you could do this:

int GetTotalFileSize(Dir directory)
{
    ulong result = 0UL;

    foreach (var child in directory.Subdirectories)
        result += GetTotalFileSize(directory);

    foreach (var file in directory.Files)
        result += file.Length;

    return result;
}
Melodiemelodion answered 28/3, 2012 at 18:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.