Search in xlsx and xls file using java
Asked Answered
S

1

10

I have a large xlsx file which as huge amount of data on which I have to implement search option I have used Apache POI jar as well as jxl jar so that the search between rows and column have been made. But it took huge time to traverse between big data can some one help me that is any jar files or any other concept available to do the search faster on Excel files...

    String searchValue="my_value_to_search";
    for (int i = 0; i < sheet.getColumns(); i++) {
        for (int j = 0; j < sheet.getRows(); j++) {
            value = sheet.getCell(i, j);
            valueType = value.getType();
            String val=getCellType(valueType, value);
            if (val != null&&val==searchValue) {
                //   To do manipulation.
            }
        }
    }
Squamous answered 23/12, 2013 at 9:48 Comment(1)
I used multithreading for such a task once. My main thread parsed the xlsx file and made the workbook then it spawned 5 other threads which are fed by the main thread with a number of records at a time. this way performance increased considerablyNonpartisan
E
6

Bottleneck is usually the huge amount of memory required to represent large XLSX files in memory at once. (XLS can't be that big by design, this is usually not a problem). To search in a really huge XLSX file without the memory problems, you could do this:

  • the xlsx file is in fact a ZIP archive, you can open it and read the contents as if it is a ZIP file.
  • inside the ZIP are folder "xl/worksheets" with files sheet1.xml (and sheet2.xml and so on)
  • you can parse these XML files using a normal XmlReader (using callbacks for maximum performance and least memory consumption).

Hope that helps.

Eogene answered 23/12, 2013 at 10:58 Comment(4)
Thanks for your replay. My another question was what will be for my large xls file..?Squamous
XLS files can only be max 65K lines, this can ususally fit in memory nicely. Unfortunately there is no similar workaround I know as with XLSX.Eogene
In my scenario i have more number of sheets each and every sheet is filled with all the cells with unique values so it took more time on parsing xls files too.Squamous
I've never used this, but have a look at poi.apache.org/poifs/how-to.html - this seems to be similar to the XML technique described above, but that also works for XLS: you parse the file on the fly and eat what you need, thus not loading the whole file in memory. Not sure how much effort it will be to actually find XLS related data in the stream though. Good luckEogene

© 2022 - 2024 — McMap. All rights reserved.