Workaround for Java bug which causes crash dump
Asked Answered
S

3

8

A program that I've developed is crashing the JVM occasionally due to this bug: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8029516. Unfortunately the bug has not been resolved by Oracle and the bug report says that there are no known workarounds.

I've tried to modify the example code from the bug report by calling .register(sWatchService, eventKinds) in the KeyWatcher thread instead, by adding all pending register request to a list that I loop through in the KeyWatcher thread but it's still crashing. I'm guessing this just had the same effect as synchronizing on sWatchService (like the submitter of the bug report tried).

Can you think of any way to get around this?

Superdominant answered 25/4, 2014 at 16:15 Comment(6)
Odd. It works for me. I just double checked.Superdominant
The link worked and not worked for me. Probably an Oracle problem. Without seeing code it's hard to answer, but rearrange your code to have one and only one watch thread responsible for the WatchService and WatchKey classes. Other threads would use those classes or services through your watcher class.Wysocki
This is a issue in the memory freeing of native memory. I suspect it the Windows library which implements malloc/free which is to blame. I would check that you have the latest DLL this might be.Tropism
@GilbertLeBlanc I tried doing everything in the watcher thread but still crashes sometimes.Superdominant
@PeterLawrey what makes you think free is to blame? It might as well be that Java tries to free something it hasn't malloced.Superdominant
@Superdominant Having read the code, it is pretty simple. More likely is that it tries to free memory more than once if anything.Tropism
A
3

I've managed to create a workaround though it's somewhat ugly.

The bug is in JDK method WindowsWatchKey.invalidate() that releases native buffer while the subsequent calls may still access it. This one-liner fixes the problem by delaying buffer clean-up until GC.

Here is a compiled patch to JDK. In order to apply it add the following Java command-line flag:
-Xbootclasspath/p:jdk-8029516-patch.jar

If patching JDK is not an option in your case, there is still a workaround on the application level. It relies on the knowledge of Windows WatchService internal implementation.

public class JDK_8029516 {
    private static final Field bufferField = getField("sun.nio.fs.WindowsWatchService$WindowsWatchKey", "buffer");
    private static final Field cleanerField = getField("sun.nio.fs.NativeBuffer", "cleaner");
    private static final Cleaner dummyCleaner = Cleaner.create(Thread.class, new Thread());

    private static Field getField(String className, String fieldName) {
        try {
            Field f = Class.forName(className).getDeclaredField(fieldName);
            f.setAccessible(true);
            return f;
        } catch (Exception e) {
            throw new IllegalStateException(e);
        }
    }

    public static void patch(WatchKey key) {
        try {
            cleanerField.set(bufferField.get(key), dummyCleaner);
        } catch (IllegalAccessException e) {
            throw new IllegalStateException(e);
        }
    }
}

Call JDK_8029516.patch(watchKey) right after the key is registred, and it will prevent watchKey.cancel() from releasing the native buffer prematurely.

Almund answered 3/5, 2014 at 22:14 Comment(2)
Awesome! It worked perfectly. I'm very impressed and grateful!Superdominant
Nice! @Almund have you considered posting your proposed fix to mail.openjdk.java.net/mailman/listinfo/core-libs-dev mailing list?Hadst
H
4

From comments:

It appears that we have an issue with I/O cancellation when there is a pending ReadDirectoryChangesW outstanding.

The statement and example code indicate that the bug is triggered when:

  1. There is a pending event that has not been consumed (it may or may not be visible to WatchService.poll() or WatchService.take())
  2. WatchKey.cancel() is called on the key

This is a nasty bug with no universal workaround. The approach depends on the specifics of your application. Consider pooling watches to a single place so you don't need to call WatchKey.cancel(). If at one point the pool becomes too large, close the entire WatchService and start over. Something similar to.

public class FileWatcerService {
    static Kind<?>[] allEvents = new Kind<?>[] {
        StandardWatchEventKinds.ENTRY_CREATE,
        StandardWatchEventKinds.ENTRY_DELETE,
        StandardWatchEventKinds.ENTRY_MODIFY
    };

    WatchService ws;

    // Keep track of paths and registered listeners
    Map<String, List<FileChangeListener>> listeners = new ConcurrentHashMap<String, List<FileChangeListener>>();
    Map<WatchKey, String> keys = new ConcurrentHashMap<WatchKey, String>();

    boolean toStop = false;

    public interface FileChangeListener {
        void onChange();
    }

    public void addFileChangeListener(String path, FileChangeListener l) {
        if(!listeners.containsKey(path)) {
            listeners.put(path, new ArrayList<FileChangeListener>());
            keys.put(Paths.get(path).register(ws, allEvents), path);
        }
        listeners.get(path).add(l);
    }

    public void removeFileChangeListener(String path, FileChangeListener l) {
        if(listeners.containsKey(path))
            listeners.get(path).remove(l);
    }

    public void start() {
        ws = FileSystems.getDefault().newWatchService();
        new Thread(new Runnable() {
            public void run() {
                while(!toStop) {
                    WatchKey key = ws.take();
                    for(FileChangeListener l: listeners.get(keys.get(key)))
                        l.onChange();
                }
            }
        }).start();
    }

    public void stop() {
        toStop = true;
        ws.close();
    }
}
Hadst answered 2/5, 2014 at 19:12 Comment(0)
O
3

You might not be able to work around the problem itself but you could deal with the error and handle it. I don't know your specific situation but I could imagine the biggest issue is the crash of the whole JVM. Putting all in a try block does not work because you cannot catch a JVM crash.

Not knowing more about your project makes it difficult to suggest a good/acceptable solution, but maybe this could be an option: Do all the file watching stuff in a separate JVM process. From your main process start a new JVM (e.g. using ProcessBuilder.start()). When the process terminates (i.e. the newly started JVM crashes), restart it. Obviously you need to be able to recover, i.e. you need to keep track of what files to watch and you need to keep this data in your main process too.

Now the biggest remaining part is to implement some communication between the main process and the file watching process. This could be done using standard input/output of the file watching process or using a Socket/ServerSocket or some other mechanism.

Oleander answered 30/4, 2014 at 20:45 Comment(0)
A
3

I've managed to create a workaround though it's somewhat ugly.

The bug is in JDK method WindowsWatchKey.invalidate() that releases native buffer while the subsequent calls may still access it. This one-liner fixes the problem by delaying buffer clean-up until GC.

Here is a compiled patch to JDK. In order to apply it add the following Java command-line flag:
-Xbootclasspath/p:jdk-8029516-patch.jar

If patching JDK is not an option in your case, there is still a workaround on the application level. It relies on the knowledge of Windows WatchService internal implementation.

public class JDK_8029516 {
    private static final Field bufferField = getField("sun.nio.fs.WindowsWatchService$WindowsWatchKey", "buffer");
    private static final Field cleanerField = getField("sun.nio.fs.NativeBuffer", "cleaner");
    private static final Cleaner dummyCleaner = Cleaner.create(Thread.class, new Thread());

    private static Field getField(String className, String fieldName) {
        try {
            Field f = Class.forName(className).getDeclaredField(fieldName);
            f.setAccessible(true);
            return f;
        } catch (Exception e) {
            throw new IllegalStateException(e);
        }
    }

    public static void patch(WatchKey key) {
        try {
            cleanerField.set(bufferField.get(key), dummyCleaner);
        } catch (IllegalAccessException e) {
            throw new IllegalStateException(e);
        }
    }
}

Call JDK_8029516.patch(watchKey) right after the key is registred, and it will prevent watchKey.cancel() from releasing the native buffer prematurely.

Almund answered 3/5, 2014 at 22:14 Comment(2)
Awesome! It worked perfectly. I'm very impressed and grateful!Superdominant
Nice! @Almund have you considered posting your proposed fix to mail.openjdk.java.net/mailman/listinfo/core-libs-dev mailing list?Hadst

© 2022 - 2024 — McMap. All rights reserved.