Avoid detecting incomplete files when watching a directory for changes in java
Asked Answered
E

5

11

I am watching a directory for incoming files (using FileAlterationObserver from apache commons).

class Example implements FileAlterationListener {
    public void prepare() {
        File directory = new File("/tmp/incoming");
        FileAlterationObserver observer = new FileAlterationObserver(directory);
        observer.addListener(this);
        FileAlterationMonitor monitor = new FileAlterationMonitor(10);
        monitor.addObserver(observer);
        monitor.start();
        // ...
    }

    public void handleFile(File f) {
        // FIXME: this should be called when the writes that 
        // created the file have completed, not before
    }

    public void onFileCreate(File f) {
        handleFile(f);
    }

    public void onFileChange(File f) {
        handleFile(f);
    }
}

The files are written in place by processes that I have no control over.

The problem I have with that code is that my callback is triggered when the File is initially created. I need it to trigger when the file has been changed and the write to the file has completed. (maybe by detecting when the file stopped changing)

What's the best way to do that?

Exalt answered 19/1, 2011 at 23:33 Comment(4)
IOW you want to know when the writing process closes the file?Deliverance
I think I basically need a debounce implementation for JavaExalt
re closing the file: Yes, if that works that would solve it. (Is there a way to get that information whithout any control over the writer?) Could I just try to get an exclusive lock on the file? Does that work across platforms?Exalt
Check out Java 7's WatchService API ,may be this will have some functionality you desire.Maplemaples
M
8

I had a similar problem. At first I thought I could use the FileWatcher service, but it doesn't work on remote volumes, and I had to monitor incoming files via a network mounted drive.

Then I thought I could simply monitor the change in file size over a period of time and consider the file done once the file size had stabilized (as fmucar suggested). But I found that in some instances on large files, the hosting system would report the full size of the file it was copying, rather than the number of bytes it had written to disk. This of course made the file appear stable, and my detector would catch the file while it was still in the process of being written.

I eventually was able to get the monitor to work, by employing a FileInputStream exception, which worked wonderfully in detecting whether a file was being written to, even when the file was on a network mounted drive.

      long oldSize = 0L;
      long newSize = 1L;
      boolean fileIsOpen = true;

      while((newSize > oldSize) || fileIsOpen){
          oldSize = this.thread_currentFile.length();
          try {
            Thread.sleep(2000);
          } catch (InterruptedException e) {
            e.printStackTrace();
          }
          newSize = this.thread_currentFile.length();

          try{
              new FileInputStream(this.thread_currentFile);
              fileIsOpen = false;
          }catch(Exception e){}
      }

      System.out.println("New file: " + this.thread_currentFile.toString());
Misfeasor answered 10/5, 2012 at 13:27 Comment(2)
So why still check the size? Why not just check if it's open?Maddie
I think it'd be better not to use an anonymous FileInputStream, so you can close it after the loop. Otherwise though this worked great for me!Maddie
C
4

A generic solution to this problem seems impossible from the "consumer" end. The "producer" may temporarily close the file and then resume appending to it. Or the "producer" may crash, leaving an incomplete file in the file system.

A reasonable pattern is to have the "producer" write to a temp file that's not monitored by the "consumer". When it's done writing, rename the file to something that's actually monitored by the "consumer", at which point the "consumer" will pick up the complete file.

Campion answered 30/11, 2015 at 20:51 Comment(0)
E
1

I don't think you can achieve what you want unless you have some file system constraints and guarantees. For example, what if you have the following scenario :

  • File X created
  • A bunch of change events are triggered that correspond with writing out of file X
  • A lot of time passes with no updates to file X
  • File X is updated.

If file X cannot be updated after it's written out, you can have a thread of execution that calculates the elapsed time from the last update to now, and after some interval decides that the file write is complete. But even this has issues. If the file system is hung, and the write does not occur for some time, you could erroneously conclude that the file is finished writing out.

Epaminondas answered 19/1, 2011 at 23:41 Comment(3)
You're right. But fortunately this doesn't happen in my case. (Steps 1 to 3 do happen, Step 4 doesn't) The file is written to reasonably quicklyExalt
In that case, you can do what I said. There are still drawbacks to this approach. For one, you'd need a thread per file. Another, is choosing a heuristic value for elapsed time.Epaminondas
Yeah. I made a followup question Debounce in Java for the thread solution. (reasonably quickly is 100ms in my case) I'm really interested if the locking approach would work because that would be a lot more elegant.Exalt
N
1

You can check the size of the file 2 or more times in a couple of seconds and if the size is not changing, then you can decide the file change has completed and proceed with your own execution.

Nada answered 19/1, 2011 at 23:58 Comment(0)
B
0

If you use FileAlterationListener and add a FileAlterationListenerAdaptor you can implement the methods you need and monitor the files with a FileAlterationMonitor ...

public static void main( String[] args ) throws Exception {

    FileAlterationObserver fao = new FileAlterationObserver( dir );
    final long interval = 500;
    FileAlterationMonitor monitor = new FileAlterationMonitor( interval );
    FileAlterationListener listener = new FileAlterationListenerAdaptor() {

        @Override
        public void onFileCreate( File file ) {
            try {
                System.out.println( "File created: " + file.getCanonicalPath() );
            } catch( IOException e ) {
                e.printStackTrace( System.err );
            }
        }

        @Override
        public void onFileDelete( File file ) {
            try {
                System.out.println( "File removed: " + file.getCanonicalPath() );
            } catch( IOException e ) {
                e.printStackTrace( System.err );
            }
        }

        @Override
        public void onFileChange( File file ) {
            try {
                System.out.println( file.getName() + " changed: ");
            } catch( Exception e ) {
                e.printStackTrace();
            } 
        }
    };
    // Add listeners...
    fao.addListener( listener );
    monitor.addObserver( fao );
    monitor.start();
}
Bemba answered 5/12, 2014 at 20:8 Comment(1)
hi, this is the way how to implement file changes, but it does not handle case when polling interval is less than time needed to copy whole file from one network location to another. I already seeing this issue in our code, so there needs to be done some extra checks or wait loop for file being triggered by filesystem then going to adaptor, adaptor needs to wait until file is complete.Orangery

© 2022 - 2024 — McMap. All rights reserved.