I recently (2020) discovered this post because of a need to count files and directories across slow connections, and this was the fastest implementation I could come up with. The .NET enumeration methods (GetFiles(), GetDirectories()) perform a lot of under-the-hood work that slows them down tremendously by comparison.
This solution utilizes the Win32 API and .NET's Parallel.ForEach() to leverage the threadpool to maximize performance.
P/Invoke:
/// <summary>
/// https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern IntPtr FindFirstFile(
string lpFileName,
ref WIN32_FIND_DATA lpFindFileData
);
/// <summary>
/// https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findnextfilew
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool FindNextFile(
IntPtr hFindFile,
ref WIN32_FIND_DATA lpFindFileData
);
/// <summary>
/// https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findclose
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool FindClose(
IntPtr hFindFile
);
Method:
public static Tuple<long, long> CountFilesDirectories(
string path,
CancellationToken token
)
{
if (String.IsNullOrWhiteSpace(path))
throw new ArgumentNullException("path", "The provided path is NULL or empty.");
// If the provided path doesn't end in a backslash, append one.
if (path.Last() != '\\')
path += '\\';
IntPtr hFile = IntPtr.Zero;
Win32.Kernel32.WIN32_FIND_DATA fd = new Win32.Kernel32.WIN32_FIND_DATA();
long files = 0;
long dirs = 0;
try
{
hFile = Win32.Kernel32.FindFirstFile(
path + "*", // Discover all files/folders by ending a directory with "*", e.g. "X:\*".
ref fd
);
// If we encounter an error, or there are no files/directories, we return no entries.
if (hFile.ToInt64() == -1)
return Tuple.Create<long, long>(0, 0);
//
// Find (and count) each file/directory, then iterate through each directory in parallel to maximize performance.
//
List<string> directories = new List<string>();
do
{
// If a directory (and not a Reparse Point), and the name is not "." or ".." which exist as concepts in the file system,
// count the directory and add it to a list so we can iterate over it in parallel later on to maximize performance.
if ((fd.dwFileAttributes & FileAttributes.Directory) != 0 &&
(fd.dwFileAttributes & FileAttributes.ReparsePoint) == 0 &&
fd.cFileName != "." && fd.cFileName != "..")
{
directories.Add(System.IO.Path.Combine(path, fd.cFileName));
dirs++;
}
// Otherwise, if this is a file ("archive"), increment the file count.
else if ((fd.dwFileAttributes & FileAttributes.Archive) != 0)
{
files++;
}
}
while (Win32.Kernel32.FindNextFile(hFile, ref fd));
// Iterate over each discovered directory in parallel to maximize file/directory counting performance,
// calling itself recursively to traverse each directory completely.
Parallel.ForEach(
directories,
new ParallelOptions()
{
CancellationToken = token
},
directory =>
{
var count = CountFilesDirectories(
directory,
token
);
lock (directories)
{
files += count.Item1;
dirs += count.Item2;
}
});
}
catch (Exception)
{
// Handle as desired.
}
finally
{
if (hFile.ToInt64() != 0)
Win32.Kernel32.FindClose(hFile);
}
return Tuple.Create<long, long>(files, dirs);
}
On my local system, the performance of GetFiles()/GetDirectories() can be close to this, but across slower connections (VPNs, etc.) I found that this is tremendously faster—45 minutes vs. 90 seconds to access a remote directory of ~40k files, ~40 GB in size.
This can also fairly easily be modified to include other data, like the total file size of all files counted, or rapidly recursing through and deleting empty directories, starting at the furthest branch.