Apache Commons IO File Monitoring vs. JDK WatchService
Asked Answered
C

1

8

I need to develop an application that will process csv files as soon as the files are created in a predefined directory. Huge number of incoming files is expected.

I have seen applications using Apache Commons IO File Monitoring in the production. It works pretty well. I have seen it processing as many as 21 million files in a day. It seems Apache Commons IO File Monitoring polls the directory and do listFiles to process the files.

My question: Is JDK WatchService as good an option as Apache Commons IO File Monitoring? Does anyone know of any pros and cons?

Commodity answered 1/10, 2015 at 12:14 Comment(1)
Commons IO is pure Java AFAIK so it may match WatchService but I doubt it can be more efficient...Pastiche
C
8

Since the time I asked this question, I have got some more insight into the matter. Hence trying to answer for those who might have similar question.

Apache commons monitoring uses a polling mechanism with a configurable polling interval. In every poll, it calls listFiles() method of File class and compares with the listFiles() output of the previous iteration to identify file creation, modification and deletion. The algorithm is robust enough and I have never seen any miss. It works great with even large volume of files. However, since it polls and invokes listFiles in every iteration, it will consume unnecessary CPU cycles, if the input file inflow is not much. Works even on network drives.

JDK WatchService does not need polling. It is event based. It s triggered only when an event occurs and hence less CPU is required if the input file inflow is not that much. If the input file inflow is heavy and the event processing mechanism is processing at a slower rate that the rate at which the event is occurring, there may be a chance of event overflow. Additionally, it will not work with network drives.

Hence, in conclusion, if the file inflow is continuos and huge, it is better to go for Apache File Monitoring. Otherwise, JDK WatchService is a good option.

Commodity answered 7/12, 2016 at 9:10 Comment(4)
Per docs.oracle.com/javase/8/docs/api/java/nio/file/… "The implementation that observes events from the file system is intended to map directly on to the native file event notification facility where available, or to use a primitive mechanism, such as polling, when a native facility is not available."Peltier
@ScottMarkwell: Do you happen to know if there is a way to force "primitive" polling instead of relying on inotify? I have a network attached disk so events aren't fired, however, I'd still like to use the watch service, and it seems it can fall back to polling when necessary. So can we force it?Coloration
I do not. How that would be handled is up to the JVM you are using. Ask that vendor.Peltier
Apache commons monitoring not only works with network drives, but also works from inside containerized apps (apps in docker).Syringe

© 2022 - 2024 — McMap. All rights reserved.