Delete file vs directory + recreate performance
Asked Answered
M

6

6

Which method of deleting files has the best performance?

  • Deleting per file, or
  • Deleting whole directory with files at once and recreating the directory

Just to note the root directory must be still there, so either I can do:

var photo_files = Directory.EnumerateFiles(item_path, "*.jpg", SearchOption.TopDirectoryOnly);

foreach (var photo in photo_files)
{
    File.Delete(photo);
}

Or delete the whole directory and then create it again.

How much performance difference would there be for 10000 or even 100000 files?

P.S. Just to clarify, .NET has no function to delete all files in a folder at once and leaving the directory.

Miscellanea answered 13/4, 2011 at 14:16 Comment(0)
F
4

When you delete a directory it is a single write to the drive's master file table, whereas if you delete each file then there is a write operation per file. Thus it is more efficient to delete the directory and recreate it.

Following the exchange with @Mr Disappointment I would offer the following amendment to my answer:

If you need to do this "a lot" then you might build yourself an extension method that looks like this:

public static class IOExtension
   {
      public static void PurgeDirectory(this DirectoryInfo d)
      {
         string path = d.FullName;

         Directory.Delete(d.FullName,true);//Delete with recursion
         Directory.CreateDirectory(path);

      }
   }

so that you can just invoke this on the DirectoryInfo class like...

Directory Info di = new DirectoryInfo(path);
di.PurgeDirectory();
Fireworm answered 13/4, 2011 at 14:24 Comment(7)
This is not correct - Directory.Delete moves recursively through files with Win32Native.FindNextFile.Bellyband
Mr. D... Can you site a reference to what you are claiming? Last time I was worrying about direct writes all that needed to be done to delete a directory was a single write to the mft and all of the space that directory was using is freed. Now if you are trying to do secure deletes that is a different matter, but that requirement is not present here.Fireworm
@Cos Callis: Cited reference - Reflector -> mscorlib -> System.IO.Directory.DeleteHelper. And this which states a directory must be empty prior to deletion: msdn.microsoft.com/en-us/library/aa365488(v=vs.85).aspxBellyband
@Cos Callis: Also note that .NET doesn't just write to the MFT when working with the file-system, it uses native Windows calls - what they do is a different matter, but in this case, they don't do what you claim and if they did, .NET doesn't utilise such a mechanism.Bellyband
@MrD. Thanks for your reference and comments. It should be pointed out that it is not enough to call Directory.Delete(path) but the call should be Directory.Delete(path, true); (where true is for to trigger a recursive delete of the contents of the directory). I still support that deleting and recreating is better, but my reasoning was incorrect. Live and learn.Fireworm
@Cos Callis: '...live and learn.' <- That's the idea, just wish I was more successful at it! Just another note on clearing the directories MFT record: this'll also bork NTFS Change Journalling, and the reasons for handling files individually doesn't stop there - but I will. :)Bellyband
As discussed in the following question immediately recreating a deleted directory may cause issues. #32594052Brioche
O
3

Regarding the performance point of view, you can delete the whole directory and recreate. Upon recreation you only need to create the directory.

Oreopithecus answered 13/4, 2011 at 14:22 Comment(1)
if you delete directory, it will have good performance rather you iterate all the files and delete one by one and recreation of directory is not big deal.Oreopithecus
B
3

I would say deleting the directory directly would give the best performance, but this is speculation based on the single factor of permission demands being requested per each File.Delete call as opposed to an initial check when using Directory.Delete - there'll be even more devils in the details, to be sure.

Both loop through the files, and ultimately, both boil down to calling native Windows functions to get the job done - once per file.

Bear in mind that the biggest bottleneck when working with IO is really the hardware of the disk being read from or written to.

Have you tested this to see what kind of results are reality in your situation?

Bellyband answered 13/4, 2011 at 14:26 Comment(2)
I can't test it now for 10000 or more files, but when the site will be ready it will deal with this amounts of files so i need to know before it will be live.Miscellanea
Well, you'd better damned test it before it goes live. It isn't ready until then. ;)Bellyband
B
0

Something, somewhere has to delete all of the files. If you don't do it directly, then it must be done for you indirectly (such as shelling out and calling "RmDir /S"). So, the system as a whole will perform about the same. Your application's performance may vary depending of if you have to wait for all of the files to be deleted first.

Bothersome answered 13/4, 2011 at 14:26 Comment(0)
H
0

Deleting all of the contents of the directory (file and sub directories) is the most robust.

Deleting and immediately recreating a directory may not work. I've personally had issues with this. There is a Stack Overflow discussing the issue. Using Directory.Delete() and Directory.CreateDirectory() to overwrite a folder

I have had success in deleting a folder on application startup and recreating the folder when a button is clicked. Basically you have to spread out the deletion and creation.

Husch answered 16/2 at 19:58 Comment(0)
G
-1

This question is like: "How hot is the center of the sun?"

The only way you will know for sure is to go there. So, create a test harness and go there.

Create a folder and put the same image in the folder 10000 times. Name the file with a guid or something unique. You can write a simple program for this.

Then run the deletion code and time it for both cases. Repeat as necessary to confirm results.

There's a .net stopwatch class that you can use to time things. You can also use environment.tickcount to get the timings.

Googly answered 13/4, 2011 at 20:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.