FileSystemWatcher vs polling to watch for file changes
Asked Answered
J

13

168

I need to setup an application that watches for files being created in a directory, both locally or on a network drive.

Would the FileSystemWatcher or polling on a timer would be the best option. I have used both methods in the past, but not extensively.

What issues (performance, reliability etc.) are there with either method?

Jetta answered 27/10, 2008 at 14:5 Comment(2)
FileSystemWatcher is a leaky abstraction and can not be relied upon for anything but the most basic cases. See here: https://mcmap.net/q/104525/-filesystemwatcher-events-raising-twice-despite-taking-measures-against-itStrive
Want to add a link for reference to this answer by Raymond Chen (Microsoft expert) on the topic of FileSystemWatcher's reliability. And his blog: The Old New Thing (search for FileSystemWatcher for example).Strive
Z
116

I have seen the file system watcher fail in production and test environments. I now consider it a convenience, but I do not consider it reliable. My pattern has been to watch for changes with the files system watcher, but poll occasionally to catch missing file changes.

Edit: If you have a UI, you can also give your user the ability to "refresh" for changes instead of polling. I would combine this with a file system watcher.

Zacatecas answered 27/10, 2008 at 14:12 Comment(11)
I've seen if fall down, too. The solution we've used is to wrap our own class around, where the wrapper class ALSO uses a timer to check on occasion that the watcher is still going.Marileemarilin
We do something similar - once we've processed the file passed into the FileCreated event, we do a manual check for any other new files before returning. This seem to mitigate any problems occurring with lots of files arriving at once.Cartesian
Thanks for the tips - I have heard that the FileSystemWatcher can be unreliable at times, it makes me a bit hesitant to depend on it in a production application.Jetta
I agree with writing an abstraction layer. The client code should receive the same event whether a polling created it, or the FileSystemWatcher did. Same with the manual "Refresh" command.Zacatecas
Could you be more specific about the production and test environments (.NET, Windows version etc.) and what files&directories was the application monitoring?Harri
I believe we tested it in XP and Server 2003 on a local directory and a file share, and had XP machines in the field. We had problems with both local dir and file share. One of the probable causes we came up with was the copy/creation of a lot of files in a short amount of time in the directory.Zacatecas
Its not very constructive nor profesional to just state "i've seen a ghost one day". It seems that people down the thread, mentioning the msdn document about non-page-outable buffer overruns could explain your problems. Have you tried using Brent's approach ?Pentstemon
I just bought a gas sensor on Amazon and it amazed me how many people said it didn't work, when they obviously didn't calibrate it correctly or didn't even know about calibration... FileSystemWatcher has known limitations with high traffic from its buffer size. Almost guarantied that's the reason for it "failing". This is readily explained in documentation and there are work arounds that provide very reliable operation (as posted below). This isn't a good answer to just say "errr, something didn't work that one time, not sure why... nobody should rely on it".Antarctic
Could these failures be attributed to #1449216 ?Trauma
I have used FileSystemWatcher on corporate managed file shares for many years now and without problem. It does fail a few times but all because my implementation not because of the watcher itself. Besides the usual buffer full problem, it can also fail because of network outage and file system remount. In both cases, the watcher will be notified through event and the fix is easy. Just re-initialize the watcher with a delay, retry, and error handling.Disturbed
Reduced reliability is a general term that some people will take offense to but it still fits the problem. Yes there are workarounds but that doesn't mean it is now a perfectly reliable tool.Bessbessarabia
V
63

The biggest problem I have had is missing files when the buffer gets full. Easy as pie to fix--just increase the buffer. Remember that it contains the file names and events, so increase it to the expected amount of files (trial and error). It does use memory that cannot be paged out, so it could force other processes to page if memory gets low.

Here is the MSDN article on buffer : FileSystemWatcher..::.InternalBufferSize Property

Per MSDN:

Increasing buffer size is expensive, as it comes from non paged memory that cannot be swapped out to disk, so keep the buffer as small as possible. To avoid a buffer overflow, use the NotifyFilter and IncludeSubdirectories properties to filter out unwanted change notifications.

We use 16MB due to a large batch expected at one time. Works fine and never misses a file.

We also read all the files before beginning to process even one...get the file names safely cached away (in our case, into a database table) then process them.

For file locking issues I spawn a process which waits around for the file to be unlocked waiting one second, then two, then four, et cetera. We never poll. This has been in production without error for about two years.

Viridi answered 5/6, 2009 at 15:29 Comment(6)
Buffer overflow? Oh, you mean stack overflow.Ouster
As of .NET 3.5: "You can set the buffer to 4 KB or larger, but it must not exceed 64 KB"Bergeron
How are you using 16MB if the max internal buffer for FileSystemWatcher is 64KB?Lectern
@ Jarvis, a buffer is a temperary storage location configured to hold information as it is transmitted until it can be processed, this usually means a FIFO or Queue as you want to deal requests in the order they arrive however in some processes like recursion in programs a FILO or Stack structure is whats used, In this case we are definitely referring to the event queue buffer and not the programs call stack bufferWafer
I would also add that you can use IncludeSubdirectories = false and NotifyFilter, as well as filter files and limit the events you're handling to further reduce the potential for buffer overload. Some events are especially costly if used without NotifyFilter, such as Changed. In general, it's probably best to keep an individual FileSystemWatcher very specific, and to broaden with new instances, rather than cram too much into one. Also, I haven't tried this, but you can probably expand/contract buffer size dynamically to optimize it.Antarctic
petermeinl.wordpress.com/2015/05/18/tamed-filesystemwatcher This post shares robust wrappers around the standard FileSystemWatcher (FSW) fixing problems commonly encountered when using it to monitor the file system in real-world applications.Chirrup
T
38

The FileSystemWatcher may also miss changes during busy times, if the number of queued changes overflows the buffer provided. This is not a limitation of the .NET class per se, but of the underlying Win32 infrastructure. In our experience, the best way to minimize this problem is to dequeue the notifications as quickly as possible and deal with them on another thread.

As mentioned by @ChillTemp above, the watcher may not work on non-Windows shares. For example, it will not work at all on mounted Novell drives.

I agree that a good compromise is to do an occasional poll to pick up any missed changes.

Trope answered 27/10, 2008 at 14:23 Comment(1)
The filesystem watcher can start raising a lot of events in quick succession. If you cannot execute your event handler at least as quickly as they are being fired, eventually the handler will start dropping events on the floor and you will miss things.Trope
M
20

Also note that file system watcher is not reliable on file shares. Particularly if the file share is hosted on a non-windows server. FSW should not be used for anything critical. Or should be used with an occasional poll to verify that it hasn't missed anything.

Mariken answered 27/10, 2008 at 14:17 Comment(4)
Has Microsoft acknowledged that it isn't reliable on non-windows file shares? We certainly are experiencing this first hand since switching from a Windows share to a Linux based SMB share.Wong
Not that I'm aware of. And I'm sure that it would simply be a blame game between the different vendors.Mariken
We've experienced problems with the file system watcher on mapped drives. If the map disconnects and then reconnects the file watcher no longer raises changes. Easily resolved but still a strike against the file system watcher IMHO.Decorous
The file system watcher is also not reliable on Windows shares. After a network hiccup, the share recovers without any issue, the file system watcher does not.Iaea
C
12

Personally, I've used the FileSystemWatcher on a production system, and it has worked fine. In the past 6 months, it hasn't had a single hiccup running 24x7. It is monitoring a single local folder (which is shared). We have a relatively small number of file operations that it has to handle (10 events fired per day). It's not something I've ever had to worry about. I'd use it again if I had to remake the decision.

Catena answered 27/10, 2008 at 14:32 Comment(0)
S
7

I have run into trouble using FileSystemWatcher on network shares. If you're in a pure Windows environment, it might not be an issue, but I was watching an NFS share and since NFS is stateless, there was never a notification when the file I was watching changed.

Scrounge answered 27/10, 2008 at 14:24 Comment(3)
I've hit the same problem, but it was unexpected to me as the FileSystemWatcher was on the same windows server which shares the folder using NFS. the fact of share a folder with NFS causes the filesystemwatcher to not see files created using the share remotely (i.e. from a Linux which map the share) while if i write a file on the very same folder under monitoring, the filesystemwatcher is triggered. it looks like NFS server writes files using a lower layer and the api layer which triggers fthe filesystemwatcher are not engaged, anyone have more info?Mastat
@Mosè I'm also facing the same issue. Have you got any solution?Scheel
not really a solution to the problem but as workaround i've ended up in (sadly) comparing difference in the filesystem structure at regular times and generating related events myself, with the correct choice of data structure it not so slow, just a little pressure on the filesystem for the listingMastat
C
7

I currently use the FileSystemWatcher on an XML file being updated on average every 100 milliseconds.

I have found that as long as the FileSystemWatcher is properly configured you should never have problems with local files.

I have no experience on remote file watching and non-Windows shares.

I would consider polling the file to be redundant and not worth the overhead unless you inherently distrust the FileSystemWatcher or have directly experienced the limitations everyone else here has listed (non-Windows shares, and remote file watching).

Carriecarrier answered 27/10, 2008 at 14:33 Comment(0)
I
6

I'd go with polling.

Network issues cause the FileSystemWatcher to be unreliable (even when overloading the error event).

Inhuman answered 27/10, 2008 at 14:29 Comment(0)
P
4

Returning from the event method as quickly as possible, using another thread, solved the problem for me:

private void Watcher_Created(object sender, FileSystemEventArgs e)
{
    Task.Run(() => MySubmit(e.FullPath));
}
Pontifical answered 10/8, 2018 at 15:0 Comment(0)
F
3

I had some big problems with FSW on network drives: Deleting a file always threw the error event, never the deleted event. I did not find a solution, so I now avoid the FSW and use polling.

Creation events on the other hand worked fine, so if you only need to watch for file creation, you can go for the FSW.

Also, I had no problems at all on local folders, no matter if shared or not.

Frangible answered 27/10, 2008 at 14:35 Comment(0)
A
3

Using both FSW and polling is a waste of time and resources, in my opinion, and I am surprised that experienced developers suggest it. If you need to use polling to check for any "FSW misses", then you can, naturally, discard FSW altogether and use only polling.

I am, currently, trying to decide whether I will use FSW or polling for a project I develop. Reading the answers, it is obvious that there are cases where FSW covers the needs perfectly, while other times, you need polling. Unfortunately, no answer has actually dealt with the performance difference(if there is any), only with the "reliability" issues. Is there anyone that can answer that part of the question?

EDIT : nmclean's point for the validity of using both FSW and polling(you can read the discussion in the comments, if you are interested) appears to be a very rational explanation why there can be situations that using both an FSW and polling is efficient. Thank you for shedding light on that for me(and anyone else having the same opinion), nmclean.

Azotemia answered 20/10, 2013 at 18:8 Comment(28)
What if you want to respond to file changes as quickly as possible? If you poll once per minute for example, you might have as much as 1 minute delay between a file changing and your application picking up on the change. The FSW event would presumably be triggered much before that. So by using both you are handling the events with as little delay as you can, but also picking up the missed events if there are any.Pontefract
@Pontefract Exactly my point. If the FSW is unreliable in cases you need quick response, there is no point using it, since you will have cases where there will be no quick response, thus, your application will be unreliable. Polling in shorter intervals, in a thread, would be what you need to do. By doing both, means you have a tolerance in response times that the polling covers, so, why not use only polling?Azotemia
Sometimes it will be better to handle events straight away if you can. For example, say you have some processing to do on new files. If the processing takes some time, you'll probably want to start processing as soon as possible. Some events that are caught by FSW will need to be queued because previous events are still 'in process'. If some events are missed, they too will be queued when the polling mechanism catches them. Admittedly it's a pragmatic rather than a clean or predicatable approach, for example the order in which events are processed/queued is not guaranteed.Pontefract
@Azotemia "thus, your application will be unreliable." - In many cases, speed is not a prerequisite for reliability. The work must get done, but it can wait a while. If we combine slow, reliable polling with fast, unreliable FSW, we get an application that is always reliable and sometimes fast, which is better than reliable and never fast. We can remove FSW and achieve the same maximum response time by doing constant polling, but this is at the expense of the responsiveness of the rest of the application, so should only be done if immediate response is absolutely required.Flosser
@Flosser A background thread polling every half a second does not affect the responsiveness of the application, while it is fast enough for the purposes of most applications. FSW is implemented when immediate response is required and it appears to be quite expensive in terms of resources. I understand your points but "sometimes fast" is too unreliable to count on it in any aspect. So, we go back to my opinion, that if you need immediate response and the FSW is unreliable, you have no use including it.Azotemia
@Azotemia I agree with the opinion when you qualify it with "if you need immediate response", but the only requirement in your post was "to check for any FSW misses". Why do you think FSW is expensive? It's merely hooking a message that's already raised by the OS regardless of whether you're using FSW. By comparison, active polling is a great waste of resources when immediate response is only a convenience: Imagine you need to check thousands of files: shall we iterate all of them every half-second, or shall we respond to a message when one of them changes and verify only periodically?Flosser
@Flosser As it is apparent from the other answers and from MSDN, FSW uses non-swapable kernel memory that is a very valuable and expensive OS resource. Polling for fast response can hardly be applicable to thousands of files but if FSW is unreliable then you cannot use that either. Again, my point of "using one or the other makes sense, both doesn't", seems logical.Azotemia
The key phrase is "when immediate response is only a convenience". In such an application, polling frequency should be low, but we can combine it with a watcher to achieve speed under good conditions, using the slow polling only as a reliable fallback. Choosing the more expensive option of increasing polling to milliseconds is not logical, it's bad prioritization. FSW is not expensive if you keep the buffer small -- note that just because we are watching thousands of files doesn't mean we need to store thousands of messages, if only a few of them are expected to change in any given period.Flosser
@Flosser This does not cancel the fact that you use more resources than just doing one of them. When immediate response is only a convenience, simplifying the design by only using a reasonable polling strategy is no expensive at all. Polling intervals of 1 minute or more at a cost of a thread that sleeps for the duration(not even a timer implementation required) is much less expensive than a FSW(which uses event polling at a separate thread and lots of other stuff). Polling is very lightweight if used correctly, while FSW is more resource-demanding, in my opinion.Azotemia
FSW uses hooks, not "event polling". This doesn't have the overhead of polling. "you use more resources than just doing one of them." -- Yes, but consider the scenario and options. 1) We need to ensure certain files are monitored, and 2) we want immediate response, but 3) performance is a higher priority. Our options are: A: Poll every minute. B: Poll constantly. C: Subscribe to FSW. D: Poll every minute, and subscribe to FSW. A satisfies 1 and 3, but not 2. B satisfies 1 and 2, but not 3. C satisfies 2 and 3, but not 1. D satisfies 1, 2, and 3.Flosser
@Flosser C is supposed to satisfy all 3, if it is reliable. D does not satisfy 1 or 2 because if C cannot satisfy 1 then C cannot satisfy 2, either. FSW using hooks in its own thread does make sense, of course, but it is still more expensive than a poll in every other way, than consuming CPU time every poll interval.Azotemia
@Azotemia Yes, C can satisfy 2. Just because it fails to ensure that 100% of files are checked does not mean that it won't give us speed where it can. "it is still more expensive" -- No, it's extremely efficient. Just to be clear: When there are no files changing in the directory being watched, FSW isn't doing anything. It's waiting, not polling. When files do change, messages are stored in an 8KB buffer -- that is insignificant on today's machines. If there are more than 8KB of messages, the rest are simply discarded, hence the need for occasional polling to pick up the misses.Flosser
@Flosser No, it cannot, if it is unreliable. Needing immediate response, means always needing it. In addition, 8KB buffer for thousands of files(where polling in short intervals is unsuitable) on a network drive means big big failure. And polling does nothing until the interval hits, it does not use any resources or hook handles, it does not create a class etc etc. At any case. I do understand your points but you cannot have it all. You are contradicting yourself according to what argument would suit your position. You think using them both suits your design. OK. It does not suit mine.Azotemia
@Flosser Not to mention the fact that you need an additional FSW for each different directory or select type of file you want to watch on, etc etc. Deciding to use both polling and a mess of FSWs to achieve your task, when one of them is enough, does register as a bad design decision in my book.Azotemia
"needing means always needing" ...Notice that 2 doesn't say "need". I very deliberately chose the word "want". This is the crux of my whole case in favor of combining both. I have been stating this repeatedly since my very first comment; it frankly baffles me that you continue to misinterpret the requirement as "needing, always needing". We combine both because we don't need but want when possible.Flosser
"does nothing until the interval hits" ...Which is as soon as possible if we're trying to emulate C. B is the worst performing, hands down: the only way it could outperform C is if every single file is changing every interval. D is the best of both worlds without the performance hit. "hook handles, create a class etc." -- This is insignificant. It isn't done in a loop, it doesn't scale with the number of files, it is hardly even a consideration. If such infinitesimal memory usage really matters, surely the overhead of using .NET at all would be a more pressing concern.Flosser
I'm sorry, but how is "a mess of FSWs" worse than a background thread with a "mess of" poll loops with various conditionals? You seem to be under the impression that just because something is implemented as a class it must be some substantially bulky and intensive thing. It's not; it's just a simple wrapper for win32 calls that gives you nice .NET events for Renamed, Deleted etc. If you're unhappy with the code resulting from combining both techniques, you could easily design your own class that gives you a nicer interface for it.Flosser
@Flosser Your insisting to ignore the primary points I make, while you are trying to prove that using both methods to do the same thing at the same time, is interesting. As I said since the beginning, if you do not require immediate response, you can just use polling. And this should be as self-evident as the fact that there is day and night. If you want to use both, it is your decision, not the "best" decision. The point is that when you do need immediate response and FSW is unreliable, using both polling and FSW is pointless.Azotemia
I have never ignored the point that "if you need immediate response, then FSW is pointless," but please don't be dishonest and pretend this was your "primary point". Your primary point was given in your answer, as well as explicitly reinforced in your comment where you said, "my point of 'using one or the other makes sense, both doesn't'". It is this point that I disagree with. You're denying the existence of legitimate scenarios where using both does make sense, and I'm trying to show you that they do exist.Flosser
@Flosser I see. In my answer I have included in my opinion. In addition, I keep repeating if FSW is unreliable. So far, in my tests on a network drive with 3 clients FSW appears to be reliable. FSW is my first choice, if it proves reliable. As I progress in the project and FSW is getting more pressed(I am about to have it check a few hundreds of thousands of files), it will show if it is as reliable as I need it to be, or I will have to revert to polling(it is not my preferred option because I will need to implement another logic). But immediate response is a requirement for me.Azotemia
Please be clear: Are you saying that your conclusion only applies to your specific situation and that the accepted answer on this page is applicable to other situations? If so, why make such a blanket critical statement as "I am surprised that experienced developers suggest it"? Your answer strongly suggests that you accept one or the other, but never both. If this is not what you meant, you should be more explicit. Your answer also never stated that immediate response is a requirement for you, and the OP didn't state it was for him either.Flosser
@Flosser Alright. The way I see it, either FSW is reliable for the specific application and it is used or it isn't and it is not used. When it is not reliable, and polling logic has to implemented, I find that it is not logical to use it, needlessly complicating the design. You are saying that there are situations that an application can benefit from an unreliable FSW because, when it works, the files will be processed immediately, something that can be desirable, even if not required. In my opinion, since the polling in these instances is reliable and covers the needs, FSW is not needed.Azotemia
@Flosser This is because the polling interval chosen is adequate for the application and because FSW is proven to be unreliable for the specific application. It is a rare thing that experienced developers suggest using two methods concurrently for solving the same problem, especially when different logic has to be applied for each. Your arguments, the way I see them, suggest that in the case you present, polling is adequate but FSW adds something that is being seen as"a treat", when it manages to work. As I said, not my way of handling problems.Azotemia
@Azotemia Which is perfectly reasonable as long as you are assuming everything else is equal, but it isn't: The implementation of FSW is vastly more efficient than polling. By the same logic we could say that caching to RAM is a waste of time and resources, because we have no guarantee that the information we need will have been cached, and reading from disk "covers the needs".Flosser
@Flosser But, it is not like you avoid polling. This is all I am saying. If you could avoid polling then I would agree with you. If you could use another solution along with an unreliable FSW that would avoid polling, I would still agree with you. FSW is much more better than polling but it makes no sense to use it when you cannot avoid polling anyway. Caching to RAM is a very different issue, not related with current subject at all. Polling is expensive on CPU time with FSW or not. Cache misses just force you to waste the time you would anyway with no cache. Unrelated.Azotemia
Yes, the same principle is at play with caching solutions. Consider: "But, it is not like you avoid reading from disk. This is all I am saying. If you could avoid reading from disk then I would agree with you. If you could use another solution along with an unreliable cache that would avoid reading from disk, I would still agree with you. RAM is much more better than disk but it makes no sense to use it when you cannot avoid the disk anyway."Flosser
Now why is the above a poor argument? Because, although we still need disk access, we need it less. Similarly, you can poll less. Just because we still check all the files doesn't mean the workload is the same. Your statement, "polling is expensive on CPU time with FSW or not," is false. By offloading the "immediacy" concern to FSW, we can change the polling to an idle, low-priority task, such that the busyness of the application at any given time is reduced drastically while still providing the "treat" of immediacy. You simply cannot achieve the same balance with polling alone.Flosser
@Flosser Thank you for taking the time and energy to clarify this in the way you did. When you put it that way, it surely makes much more sense. Just like there are times that a cache is not suitable to your specific problem, so the FSW(when it proves unreliable) may not be suitable. It turns out that you were right all along. I am sorry it took so much time for me to get it.Azotemia
I
1

Working solution for working with create event instead of change

Even for copy, cut, paste, move.

class Program
{        

        static void Main(string[] args)
        {
            string SourceFolderPath = "D:\\SourcePath";
            string DestinationFolderPath = "D:\\DestinationPath";
            FileSystemWatcher FileSystemWatcher = new FileSystemWatcher();
            FileSystemWatcher.Path = SourceFolderPath;
            FileSystemWatcher.IncludeSubdirectories = false;
            FileSystemWatcher.NotifyFilter = NotifyFilters.FileName;   // ON FILE NAME FILTER       
            FileSystemWatcher.Filter = "*.txt";         
             FileSystemWatcher.Created +=FileSystemWatcher_Created; // TRIGGERED ONLY FOR FILE GOT CREATED  BY COPY, CUT PASTE, MOVE  
            FileSystemWatcher.EnableRaisingEvents = true;

            Console.Read();
        }     

        static void FileSystemWatcher_Created(object sender, FileSystemEventArgs e)
        {           
                string SourceFolderPath = "D:\\SourcePath";
                string DestinationFolderPath = "D:\\DestinationPath";

                try
                {
                    // DO SOMETING LIKE MOVE, COPY, ETC
                    File.Copy(e.FullPath, DestinationFolderPath + @"\" + e.Name);
                }
                catch
                {
                }          
        }
}

Solution for this file watcher while file attribute change event using static storage

class Program
{
    static string IsSameFile = string.Empty;  // USE STATIC FOR TRACKING

    static void Main(string[] args)
    {
         string SourceFolderPath = "D:\\SourcePath";
        string DestinationFolderPath = "D:\\DestinationPath";
        FileSystemWatcher FileSystemWatcher = new FileSystemWatcher();
        FileSystemWatcher.Path = SourceFolderPath;
        FileSystemWatcher.IncludeSubdirectories = false;
        FileSystemWatcher.NotifyFilter = NotifyFilters.LastWrite;          
        FileSystemWatcher.Filter = "*.txt";         
        FileSystemWatcher.Changed += FileSystemWatcher_Changed;
        FileSystemWatcher.EnableRaisingEvents = true;

        Console.Read();
    }     

    static void FileSystemWatcher_Changed(object sender, FileSystemEventArgs e)
    {
        if (e.Name == IsSameFile)  //SKIPS ON MULTIPLE TRIGGERS
        {
            return;
        }
        else
        {
            string SourceFolderPath = "D:\\SourcePath";
            string DestinationFolderPath = "D:\\DestinationPath";

            try
            {
                // DO SOMETING LIKE MOVE, COPY, ETC
                File.Copy(e.FullPath, DestinationFolderPath + @"\" + e.Name);
            }
            catch
            {
            }
        }
        IsSameFile = e.Name;
    }
}

This is a workaround solution for this problem of multiple triggering event.

Isthmus answered 15/2, 2016 at 18:41 Comment(0)
T
0

I would say use polling, especially in a TDD scenario, as it is much easier to mock/stub the presence of files or otherwise when the polling event is triggered than to rely on the more "uncontrolled" fsw event. + to that having worked on a number of apps which were plagued by fsw errors.

Taryn answered 3/7, 2014 at 9:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.