java code to search all .doc and .docx files from local system
Asked Answered
A

5

5

i am working in a desktop application for windows version using java. In my application there is a requirement to search all .doc and .docx files from the MyDocuments/Documents (as per O.S.) from local system and display there name and file size.

I am not getting the way that will help me to list out all the *.doc, *.docx, *.xls, *.xlsx, *.csv, *.txt, *.pdf, *.ppt, *.pptx files present in Documents/MyDocuments.

Please give me your valuable suggestions or suggest me any link that will help me in writing code for making a faster search and listing out with it's Name,size and Type .

Arola answered 10/11, 2010 at 11:17 Comment(0)
L
8

You can use Apache Commons IO, in particular the FileUtils class. That would give something like:

import java.io.File;
import java.util.Collection;

import org.apache.commons.io.*;
import org.apache.commons.io.filefilter.*;

public class SearchDocFiles {
    public static String[] EXTENSIONS = { "doc", "docx" };

    public Collection<File> searchFilesWithExtensions(final File directory, final String[] extensions) {
        return FileUtils.listFiles(directory,
                extensions,
                true);
    }

    public Collection<File> searchFilesWithCaseInsensitiveExtensions(final File directory, final String[] extensions) {
        IOFileFilter fileFilter = new SuffixFileFilter(extensions, IOCase.INSENSITIVE);
        return FileUtils.listFiles(directory,
                fileFilter,
                DirectoryFileFilter.INSTANCE);
    }


    public static void main(String... args) {
        // Case sensitive
        Collection<File> documents = new SearchDocFiles().searchFilesWithExtensions(
                new File("/tmp"),
                SearchDocFiles.EXTENSIONS);
        for (File document: documents) {
            System.out.println(document.getName() + " - " + document.length());
        }

        // Case insensitive
        Collection<File> caseInsensitiveDocs = new SearchDocFiles().searchFilesWithCaseInsensitiveExtensions(
                new File("/tmp"),
                SearchDocFiles.EXTENSIONS);
        for (File document: caseInsensitiveDocs) {
            System.out.println(document.getName() + " - " + document.length());
        }
    }
}
Lucky answered 10/11, 2010 at 11:33 Comment(1)
@khachik You can ignoreCase or upper/lower case as you need.Austronesian
T
2

Check this method.

public void getFiles(String path) {
    File dir = new File(path);
    String[] children = dir.list();
    if (children != null) {
        for (int i = 0; i < children.length; i++) {
            // Get filename of file or directory
            String filename = children[i];
            File file = new File(path + File.separator + filename);
            if (!file.isDirectory()) {
                if (file.getName().endsWith(".doc") || file.getName().endsWith(".docx")) {
                    System.out.println("File Name " + filename + "(" + file.length()+"  bytes)");
                }
            } else {
                getFiles(path + File.separator + filename);
            }
        }
    }
}
Thundersquall answered 10/11, 2010 at 11:56 Comment(0)
H
1

If you want to find all the files with .doc(x) extensions, you can use java.io.File.list(FileFilter) method, say:

public java.util.List mswordFiles(java.io.File dir) {
   java.util.List res = new java.util.ArrayList();
   _mswordFiles(dir, res);
   return res;
}
protected void _mswordFiles(java.io.File dir, java.util.List res) {
   java.io.File [] files = dir.listFiles(new java.io.FileFilter() {
        public boolean accept(java.io.File f) {
           String name = f.getName().toLowerCase();
           return !f.isDirectory() && (name.endsWith(".doc") || name.endsWith(".docx"));
        }
     });
   for(java.io.File f:files) {res.add(f);}
   java.io.File [] dirs = dir.listFiles(new java.io.FileFilter() {
        public boolean accept(java.io.File f) {
            return f.isDirectory();
        }
      });
   for(java.io.File d:dirs) {_mswordFiles(d, res);}
}
Hairtail answered 10/11, 2010 at 11:46 Comment(0)
D
1

I don't have enough reputation to comment so have to submit this as an 'answer':

@khachik You can ignoreCase or upper/lower case as you need. – Martijn Verburg Nov 10 '10 at 12:02

This took me a bit to figure out and finally found how to ignore case with this solution:

Add

public static final IOFileFilter filter = new SuffixFileFilter(EXTENSIONS, IOCase.INSENSITIVE);

Then modify searchFilesWithExtensions method to return FileUtils.listFiles( directory, filter, DirectoryFileFilter.DIRECTORY );

Disbursement answered 15/1, 2014 at 2:9 Comment(0)
R
0

You may want to look into extracting MSWord text using Apache POI and indexing them through Lucene (for accuracy, flexibility, and speed of searching). Nutch and Solr both have helper libraries for Lucene which you can use to speed things up (that is if Lucene core is not sufficient).

[update] I have misunderstood the original question (before the update). You just need to search the filesystem using Java?? Java API can do that. Apache also has a library (Commons IO) that includes a file utility to list all files under a directory including its subdirectories given a filter. I've used it before, e.g. FileUtils.listFiles(dir, filefilter, dirfilter) or FileUtils.listFiles(dir, extensions[], recursive). Then do your search function from that list.

Rigid answered 10/11, 2010 at 11:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.