What is the difference between Files.walk.filter and Files.find?
Asked Answered
S

2

8

This code searches for a specific file:

Stream<Path> findMyFile = Files.find(Paths.get("c:\\temp\\pathtest"), Integer.MAX_VALUE,(p, a) -> p.endsWith("test.txt") && a.isRegularFile());

Stream<Path> findMyFileSecond = Files.walk(Paths.get("c:\\temp\\pathtest"),Integer.MAX_VALUE).filter(p -> p.endsWith("test.txt"));

findMyFile.forEach(System.out::println);
findMyFileSecond.forEach(System.out::println);

Both results contain the same files and both methods complete in almost the same time. JavaDoc says the following:

This method walks the file tree in exactly the manner specified by * the #walk walk method Compare to calling * java.util.stream.Stream#filter filter on the Stream * returned by {@code walk} method, this meth od may be more efficient by * avoiding redundant retrieval of the BasicFileAttributes

When should I use walk in combination with filter and when find? What is considered best practice?

Selfwill answered 15/2, 2017 at 15:36 Comment(1)
The documentation is pretty clear. find is better than walk if you’re only planning to apply a filter to the Stream returned by walk.Heikeheil
R
9

TL;DR: if you you need to filter out files/dirs by attributes - use Files.find(), if you don't need to filter by file attributes - use Files.walk().

Details

There is a slight difference which is actually explained in the documentation, but in a way that it feels completely wrong. Reading the source code makes it clear:

  • Files.find:

    return StreamSupport.stream(...)
                            .onClose(iterator::close)
                            .filter(entry -> matcher.test(entry.file(), entry.attributes()))
                            .map(entry -> entry.file());
    
  • Files.walk:

    return StreamSupport.stream(...)
                            .onClose(iterator::close)
                            .map(entry -> entry.file());
    

This means that if, in your eventual filter, you need to get and validate file attributes - chances are that File.find will be faster. That's because with File.walk, your filter callback will need an extra call to e.g. Files.readAttributes(file, BasicFileAttributes.class), while with File.find - the attributes are already retrieved and given to you in the filter callback.

I just tested it with my sample 10K-files-in-many-folders structure on Windows, by searching files only (i.e. excluding folders):

// pre-Java7/8 way via recursive listFiles (8037 files returned): 1521.657 msec.
for (File f : new File(dir).listFiles()) {
    if (f.isDirectory()) {
        _getFiles(files, path, pattern);
    } else {
        ...
    }
}

// Files.walk(8037 files returned): 1575.766823 msec.
try (Stream<Path> stream = Files.walk(path, Integer.MAX_VALUE) {
    files = stream.filter(p -> {
        if (Files.isDirectory(p)) { return false; } // this extra check makes it much slower than Files.find
        ... 
    }).map(p -> p.toString()).collect(Collectors.toList());
}

// Files.find(8037 files returned): 27.606675 msec.
try (Stream<Path> stream = Files.find(path, Integer.MAX_VALUE, (p, a) -> !a.isDirectory())) {
    files = stream.filter(p -> { ... }).map(p -> p.toString()).collect(Collectors.toList());
}

// Files.walkFileTree(8037 returned): 27.443974 msec.
Files.walkFileTree(new File(path).toPath(), new SimpleFileVisitor<Path>() { 
    @Override
    public FileVisitResult visitFile(Path p, BasicFileAttributes attrs) throws IOException {
        ...
        return FileVisitResult.CONTINUE;
    }
});
Rideout answered 10/9, 2018 at 15:53 Comment(1)
Excellent, BasicFileAttributes have their use-cases, isRegularFile, lastModifiedTime and such.Batholith
A
1

I believe walk() would be advantageous if you would need to apply some intermediary operation on the directory listing before applying a filter or parallelize the stream.

Applecart answered 15/2, 2017 at 16:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.