Easiest way to compare two Excel files in Java?
Asked Answered
P

12

20

I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?

Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.

Pembrook answered 14/5, 2009 at 23:16 Comment(0)
P
9

Here's what I ended up doing (with the heavy lifting being done by DBUnit):

/**
 * Compares the data in the two Excel files represented by the given input
 * streams, closing them on completion
 * 
 * @param expected can't be <code>null</code>
 * @param actual can't be <code>null</code>
 * @throws Exception
 */
private void compareExcelFiles(InputStream expected, InputStream actual)
  throws Exception
{
  try {
    Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
  }
  finally {
    IOUtils.closeQuietly(expected);
    IOUtils.closeQuietly(actual);
  }
}

This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.

Pembrook answered 15/5, 2009 at 7:26 Comment(7)
Hey using XlsDataSet from DBUnit is a really clever idea; didn't think of that :-).Emory
The latest DbUnit version 2.5 does not work with .xlsx file, only with .xls files. Do you have any idea how to make it work for ".xlsx" files?Shutin
No idea sorry, unless you have the option of first exporting the .xlsx files to .xls format.Pembrook
I tried it but it does not identify differences between cell values, is this normal?Auriol
@Auriol no, it worked for me, but this was six years ago! :-)Pembrook
I also had the problem that 2.5.4 did not work with xlsx files. I changed to use the ExcelComparator from the apache poi examples: svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/…Cruce
Doesn't work with DBUnit 2.7 with xlsx filesSubtraction
P
19

You might consider using my project simple-excel which provides a bunch of Hamcrest Matchers to do the job.

When you do something like the following,

assertThat(actual, WorkbookMatcher.sameWorkbook(expected));

You'd see, for example,

java.lang.AssertionError:
Expected: entire workbook to be equal
     but: cell at "C14" contained <"bananas"> expected <nothing>,
          cell at "C15" contained <"1,850,000 EUR"> expected <"1,850,000.00 EUR">,
          cell at "D16" contained <nothing> expected <"Tue Sep 04 06:30:00">
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

That way, you can run it from your automatted tests and get meaningful feedback whilst you're developing.

You can read more about it at this article on my site

Paulie answered 10/9, 2012 at 7:21 Comment(8)
If you're the developer of this project, you should probably add a disclaimer to that effect.Pembrook
Because apart from being good form, it's a rule of this site, see stackoverflow.com/faq#promotion.Pembrook
sure, but i'd say its not so much a rule from SO; it suggests that the community may get peeved with you. I'd have thought this could happen in the case of flagrant self promotion. My reply is genuine and fits into modern java testing strategies, ie hamcrest, well. Its a good answer to the question, whomever offered it IMO.Paulie
@Paulie .. instead of assertion error, can I get simply the result of comparison in some easy to read format? I want the differences to be feed to another function for further processing.Parasynthesis
Not by default but you could do something around a custom runner to catch the assertion error and process it. For something more elegant, you could try calling the matchers directly which should return a boolean. You can build up the errors in a customer Description class and process it that way...Paulie
Why you don't put that in maven central? Also what is the license? Is there any limitation using it? Do i have to give my code if i use your library?Auriol
If you look in the source, you'll see it's an Apache 2 licence. See the licence for details (apache.org/licenses/LICENSE-2.0). You don't have to give your code away if you use it. As to maven central, it's already available via a Maven repository so can be easily downloaded from within a pom.xml (see github.com/tobyweston/simple-excel/issues/6 for details)Paulie
Not available on Maven Central or jcenter. Meh.Roccoroch
P
9

Here's what I ended up doing (with the heavy lifting being done by DBUnit):

/**
 * Compares the data in the two Excel files represented by the given input
 * streams, closing them on completion
 * 
 * @param expected can't be <code>null</code>
 * @param actual can't be <code>null</code>
 * @throws Exception
 */
private void compareExcelFiles(InputStream expected, InputStream actual)
  throws Exception
{
  try {
    Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
  }
  finally {
    IOUtils.closeQuietly(expected);
    IOUtils.closeQuietly(actual);
  }
}

This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.

Pembrook answered 15/5, 2009 at 7:26 Comment(7)
Hey using XlsDataSet from DBUnit is a really clever idea; didn't think of that :-).Emory
The latest DbUnit version 2.5 does not work with .xlsx file, only with .xls files. Do you have any idea how to make it work for ".xlsx" files?Shutin
No idea sorry, unless you have the option of first exporting the .xlsx files to .xls format.Pembrook
I tried it but it does not identify differences between cell values, is this normal?Auriol
@Auriol no, it worked for me, but this was six years ago! :-)Pembrook
I also had the problem that 2.5.4 did not work with xlsx files. I changed to use the ExcelComparator from the apache poi examples: svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/…Cruce
Doesn't work with DBUnit 2.7 with xlsx filesSubtraction
E
6

A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.

However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out. So you'll need another kind of comparison for your test.

I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).

Apache POI might be useful for reading the file.

BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...

Emory answered 14/5, 2009 at 23:51 Comment(5)
You're right, I was using the term unit test loosely; in fact it's an integration test run by JUnit. I'll fix that now.Pembrook
Thanks for putting me on the right track; see my DBUnit solution below (or above, depending on votes!)Pembrook
For .xlsx files: MD5 sums will definitely be different, but the directories resulting of unzipping both .xlsx files should be identical (that would be a better way of getting the canonical form)Desalinate
@golimar: Yes, it should, but it is not :-) (just tested with Excel 2016). There is still metadata that may change. For example, .xlsx files contain the author and time of last modification, and they contain the cell that was active when the document was changed. You still need some canonical form for comparison...Emory
@Emory I tried with Excel 2016 too... the directories were identical when compared with diff -r as it compares the contents of files recursively and not author and times. All depends on what you consider identical, for me it's perfect as I wanted to compare contents and ignore things like file saving timeDesalinate
V
6

I needed to do something similar and was already using the Apache POI library in my project to create Excel files. So I opted to use the included ExcelExtractor interface to export both workbooks as a string of text and asserted that the strings were equal. There are implementations for both HSSF for .xls as well as XSSF for .xlsx.

Dump to string:

XSSFWorkbook xssfWorkbookA = ...;
String workbookA = new XSSFExcelExtractor(xssfWorkbookA).getText();

ExcelExtractor has some options for what all should be included in the string dump. I found it to have useful defaults of including sheet names. In addition it includes the text contents of the cells.

Volitant answered 5/8, 2015 at 19:7 Comment(0)
H
4

The easiest way I find is to use Tika. I use it like this:

private void compareXlsx(File expected, File result) throws IOException, TikaException {
     Tika tika = new Tika();
     String expectedText = tika.parseToString(expected);
     String resultText = tika.parseToString(result);
     assertEquals(expectedText, resultText);
}


<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.13</version>
    <scope>test</scope>
</dependency>
Homemaker answered 20/5, 2016 at 13:15 Comment(0)
C
2

You could use javaxdelta to check whether the two files are the same. It's available from here:

http://javaxdelta.sourceforge.net/

Chesterchesterfield answered 14/5, 2009 at 23:26 Comment(1)
I eventually got javaxdelta working after mucking around with its dependency on the "trove" library, but although it works as advertised, sleske is right that I need a canonical comparison, not a byte-by-byte comparison. Thanks anyway for the suggestion, which I've voted up.Pembrook
P
0

Just found out there's something in commons-io's FileUtils. Thanks for the other answers.

Pembrook answered 14/5, 2009 at 23:48 Comment(1)
This actually doesn't solve my problem, as there seem to be differences between the Excel files that aren't due to geniune content differences. I'll try sleske's suggestion of parsing the files' contents and doing a canonical comparison.Pembrook
C
0

Please, take a look at the site to compare the binary files, http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html

Tiger

Caseinogen answered 15/5, 2009 at 2:1 Comment(1)
Two issues: that code is just doing a byte-for-byte comparison, which as sleske points out, isn't ideal where Excel is concerned. Also, I was after a third-party utility method, not a block of code on some forum that may or may not work. I'll update the question to make this clear.Pembrook
C
0

You may use Beyond Compare 3 which can be started from command-line and supports different ways to compare Excel files, including:

  • Comparing Excel sheets as database tables
  • Checking all textual content
  • Checking textual content with some formating
Crosstree answered 26/4, 2010 at 12:5 Comment(1)
Command-line tools are ugly to invoke from Java (in my case, JUnit).Pembrook
V
0

To test only content of the first sheets in Kotlin (easily can be converted to java).

private fun checkEqualityExcelDocs(doc : XSSFWorkbook, doc1 : XSSFWorkbook) : Boolean{
        val mapOfCellDoc = doc.toList().first().toList().flatMap { row -> row.map { Pair(PivotExcelCreator.IndexInThePivotTable(it.rowIndex,it.columnIndex),it.stringCellValue) }}.toMap()
        val mapOfCellDoc1 = doc1.toList().first().toList().flatMap { row -> row.map { Pair(PivotExcelCreator.IndexInThePivotTable(it.rowIndex,it.columnIndex),it.stringCellValue) }}.toMap()
        if(mapOfCellDoc.size == mapOfCellDoc1.size){
            return mapOfCellDoc.entries.all { mapOfCellDoc1.containsKey(it.key) && mapOfCellDoc[it.key] == mapOfCellDoc1[it.key]}
        }
        return false
    }

data class IndexInThePivotTable(val row: Int, val col: Int)

and in your code add assert

    assertTrue(checkEqualityExcelDocs(expected, actual), "Docs aren't equal!")

as you can see doc.toList().first() will take only the first sheet of document, if you need to compare each sheet respectively change code a little.

Also it is quite good idea to not take into account "" empty strings cells, I didn't need this functionality (As well, simply add this part, if you need).


also it can be useful information

//first doc I've got from outputstream such way
val out = ByteArrayOutputStream()
//some method which writes excel to outputstream
val firstDoc = XSSFWorkbook(ByteArrayInputStream(out.toByteArray()))

and second doc from file to compare with

val secondDoc = XSSFWorkbook(Test::class.java.getClassLoader().getResource("yourfile.xlsx").path)
Verniavernice answered 31/5, 2019 at 16:26 Comment(0)
R
0

Best way for me after a long research: Using Apachi POI as external library Using XSSF for xlsx files. If you want to compare xls files have to use HSSF

Good luck

public static void compareTwoExcellFiles04(String file01, String file02) throws IOException {
    try
    {
        File file1 = new File(file01);
        File file2 = new File(file02);
        FileInputStream fis1 = new FileInputStream(file1);
        FileInputStream fis2 = new FileInputStream(file2);
        XSSFWorkbook wb1 = new XSSFWorkbook(fis1);
        XSSFWorkbook wb2 = new XSSFWorkbook(fis2);
        XSSFSheet sheet1 = wb1.getSheetAt(0);
        XSSFSheet sheet2 = wb2.getSheetAt(0);
        ArrayList<String> arrayList1 = new ArrayList<>();
        ArrayList<String> arrayList2 = new ArrayList<>();
        for (Row row1 : sheet1) {
            Iterator<Cell> cellIterator = row1.cellIterator();
            while (cellIterator.hasNext()) {
                Cell cell1 = cellIterator.next();
                arrayList1.add(cell1.getStringCellValue());
                }
            }
        for (Row row2 : sheet2) {
            Iterator<Cell> cellIterator = row2.cellIterator();
            while (cellIterator.hasNext()) {
                Cell cell2 = cellIterator.next();
                arrayList2.add(cell2.getStringCellValue());
            }
        }
    if (arrayList1.equals(arrayList2)){
        System.out.println("the files are equal");
    }else{
        System.out.println("NOT EQUAL");
    }
    }
    catch(Exception e)
    {
        e.printStackTrace();
    }
}
Romp answered 6/11, 2023 at 7:30 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Embellishment
I
-1

Maybe... compare MD5 digests of each file? I'm sure there are a lot of ways to do it. You could just open both files and compare each byte.

EDIT: James stated how the XLS format might have differences in the metadata. Perhaps you should use the same interface you used to generate the xls files to open them and compare the values from cell to cell?

Izmir answered 14/5, 2009 at 23:18 Comment(1)
This is a good idea, except you do have to consider if the files will be identical (i.e. that there's not any metadata involved, such as in the XLSX format, for example. This differing metadata will, obviously, produce different hashes).Unfavorable

© 2022 - 2024 — McMap. All rights reserved.