What happens with Directory.EnumerateFiles if directory content changes during iteration?
Asked Answered
P

3

7

I've read discussions about difference between Directory.EnumerateFiles and Directory.GetFiles.

I understand that internally they both use System.IO.FileSystemEnumerableFactory.CreateFileNameIterator()

The difference is that EnumerateFiles might use deferred execution (lazy), while GetFiles() does a ToArray, so the function is already executed.

But what happens if files and folders are added to the dictionary during the iteration. Will the iteration only iterate over the items that were present during the EnumerateFiles()?

Even worse: what happens if files are removed during iterations: will they still be iterated?

Photocell answered 10/4, 2015 at 7:31 Comment(0)
P
6

Thanks Michal Komorowski. However when trying his solution myself I saw a remarkable distinction between Directory.EnumerateFiles and Directory.GetFiles():

Directory.CreateDirectory(@"c:\MyTest");
// Create fies: b c e
File.CreateText(@"c:\MyTest\b.txt").Dispose();
File.CreateText(@"c:\MyTest\c.txt").Dispose();
File.CreateText(@"c:\MyTest\e.txt").Dispose();

string[] files = Directory.GetFiles(@"c:\MyTest");
var fileEnumerator = Directory.EnumerateFiles(@"c:\MyTest");

// delete file c; create file a d f
File.Delete(@"c:\MyTest\c.txt");
File.CreateText(@"c:\MyTest\a.txt").Dispose();
File.CreateText(@"c:\MyTest\d.txt").Dispose();
File.CreateText(@"c:\MyTest\f.txt").Dispose();

Console.WriteLine("Result from Directory.GetFiles");
foreach (var file in files) Console.WriteLine(file);
Console.WriteLine("Result from Directory.EnumerateFiles");
foreach (var file in fileEnumerator) Console.WriteLine(file);

This will give different output.

Result from Directory.GetFiles
c:\MyTest\b.txt
c:\MyTest\c.txt
c:\MyTest\e.txt
Result from Directory.EnumerateFiles
c:\MyTest\b.txt
c:\MyTest\d.txt
c:\MyTest\e.txt
c:\MyTest\f.txt

Results:

  • GetFiles still saw the old files: B C E as expected
  • EnumerateFiles saw the new files D and F. It correctly skipped the deleted file C, but it missed the new file A.

So the difference in usage between EnumerateFiles and GetFiles is more than just performance.

  • GetFiles returns the files that were in the folder the moment you called the function. Which could be expected, because it's just an enumeration over a string collection
  • EnumerateFiles correctly skips deleted files, but doesn't see all added files. If the folder changes while enumerating the result is fairly undefined.

So if you expect that your folder changes while enumerating carefully choose the desired function

  • Expect GetFiles to see deleted files
  • Expect EnumerateFiles to miss some of the new files.
Photocell answered 10/4, 2015 at 9:35 Comment(1)
Another thing to note is that if the MyTest folder is deleted during the foreach on the enumeration, you will get a System.IO.IOException, which is likely to happen if MyTest can be on a network share.Perez
K
1

I made a different experiment, because I was interested in the case of slow enumeration of the files, while more files are created inside the enumerated directory. The scenario of slow enumeration could happen for example if there is a SemaphoreSlim.WaitAsync inside the enumeration loop (for throttling purposes). The experiment below starts by deleting all files from the target directory, then creating a specific number of initial files, and then starts enumerating the files with a 100 msec delay, while another asynchronous workflow creates more files at a rate of one file every 150 msec. Will the enumerator see the newly created files?

static async Task Main(string[] args)
{
    const string FOLDER_PATH = @"C:\DirectoryEnumerateFilesTest";
    const int FILES_COUNT = 10;
    Console.WriteLine($"Deleting files");
    DeleteAllFiles(FOLDER_PATH);
    Console.WriteLine($"Creating files");
    await CreateFiles(FOLDER_PATH, startIndex: 1, filesCount: FILES_COUNT, delay: 0);
    Console.WriteLine($"Enumerating files while creating more files");
    var filePaths = Directory.EnumerateFiles(FOLDER_PATH);
    var cts = new CancellationTokenSource();
    var producer = CreateFiles(FOLDER_PATH,
        startIndex: 501, filesCount: 100, delay: 150, cts.Token);
    var enumeratedCount = 0;
    foreach (var filePath in filePaths)
    {
        Console.WriteLine($"Enumerated:   {Path.GetFileName(filePath)}");
        await Task.Delay(100);
        enumeratedCount++;
    }
    Console.WriteLine($"Total files enumerated: {enumeratedCount:#,0}");
    cts.Cancel();
    await producer;
}

private static void DeleteAllFiles(string folderPath)
{
    int count = 0;
    foreach (var filePath in Directory.GetFiles(folderPath))
    {
        File.Delete(filePath);
        Console.WriteLine($"File deleted: {Path.GetFileName(filePath)}");
        count++;
    }
    Console.WriteLine($"Total files deleted: {count:#,0}");
}

private static async Task CreateFiles(string folderPath,
    int startIndex, int filesCount, int delay, CancellationToken token = default)
{
    int count = 0;
    foreach (var i in Enumerable.Range(startIndex, filesCount))
    {
        var delayTask = Task.Delay(delay, token);
        await Task.WhenAny(delayTask);
        if (delayTask.IsCanceled) break;
        var fileName = $"File-{i:000}.txt";
        var filePath = Path.Combine(folderPath, fileName);
        File.WriteAllText(filePath, "Content");
        count++;
        Console.WriteLine($"File created: {fileName}");
    }
    Console.WriteLine($"Total files created: {count:#,0}");
}

The answer is: it depends on the number of the initial files, and the length of the filenames. The threshold is at around 50 initial files, but it becomes smaller when the files have longer filenames. The enumeration will eventually stop, provided that the enumerator works faster than the files-producer, in which case a number of files will remain unobserved (typically around 20).

Here is the output of the above experiment for FILES_COUNT = 10 (meaning 10 existing files at the time the enumerator is created).

Deleting files
Total files deleted: 0
Creating files
File created: File-001.txt
File created: File-002.txt
File created: File-003.txt
File created: File-004.txt
File created: File-005.txt
File created: File-006.txt
File created: File-007.txt
File created: File-008.txt
File created: File-009.txt
File created: File-010.txt
Total files created: 10
Enumerating files while creating more files
Enumerated:   File-001.txt
Enumerated:   File-002.txt
File created: File-501.txt
Enumerated:   File-003.txt
File created: File-502.txt
Enumerated:   File-004.txt
Enumerated:   File-005.txt
File created: File-503.txt
Enumerated:   File-006.txt
File created: File-504.txt
Enumerated:   File-007.txt
Enumerated:   File-008.txt
File created: File-505.txt
Enumerated:   File-009.txt
File created: File-506.txt
Enumerated:   File-010.txt
Total files enumerated: 10
File created: File-507.txt
Total files created: 7

10 files are too few, so none of the files created afterwards were observed by the enumerator.

Kristof answered 24/3, 2020 at 10:25 Comment(0)
F
0

There is only one way to check:

Directory.CreateDirectory(@"c:\\Temp");
File.Create(@"c:\\Temp\\a.txt").Close();
File.Create(@"c:\\Temp\\b.txt").Close();
File.Create(@"c:\\Temp\\c.txt").Close();
foreach (var f in Directory.EnumerateFiles(@"c:\\Temp"))
{
    Console.WriteLine(f);
    //Let's delete a file
    File.Delete(@"c:\\Temp\\c.txt");
    //Let's create a new file
    File.Create(@"c:\\Temp\\d.txt").Close();
}

Initially C:\Temp contains 3 files: a.txt, b.txt and c.txt. During the iteration one of these file is being deleted and one is being created. Finally, the C:\Temp contains the following files: a.txt, b.txt and d.txt However, in the console you will see the original content of this directory i.e.:

c:\\Temp\a.txt
c:\\Temp\b.txt
c:\\Temp\c.txt
Federation answered 10/4, 2015 at 7:56 Comment(3)
Thank you, easy and clear method. I tried to do this multi-threaded, but of course that is not necessary.Photocell
Thank you, easy and clear method. I tried to do this multi-threaded, but of now I see that is not necessary. I've adapted your program and saw a remarkable difference between GetEnumeration and GetFiles(). Worthy a separate answer.Photocell
Michal Komorowski: The difference was too long to explain to describe as comment. Hence I wrote it as a separate answerPhotocell

© 2022 - 2024 — McMap. All rights reserved.