Different results with Java's digest versus external utilities
Asked Answered
B

1

195

I have written a simple Java class to generate the hash values of the Windows Calculator file. I am using Windows 7 Professional with SP1. I have tried Java 6.0.29 and Java 7.0.03. Can someone tell me why I am getting different hash values from Java versus (many!) external utilities and/or websites? Everything external matches with each other, only Java is returning different results.

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Map.Entry;
import java.util.zip.CRC32;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class Checksum 
{
    private static int size = 65536;
    private static File calc = new File("C:/Windows/system32/calc.exe");

    /*
        C:\Windows\System32\calc.exe (verified via several different utilities)
        ----------------------------
        CRC-32b = 8D8F5F8E
        MD5     = 60B7C0FEAD45F2066E5B805A91F4F0FC
        SHA-1   = 9018A7D6CDBE859A430E8794E73381F77C840BE0
        SHA-256 = 80C10EE5F21F92F89CBC293A59D2FD4C01C7958AACAD15642558DB700943FA22
        SHA-384 = 551186C804C17B4CCDA07FD5FE83A32B48B4D173DAC3262F16489029894FC008A501B50AB9B53158B429031B043043D2
        SHA-512 = 68B9F9C00FC64DF946684CE81A72A2624F0FC07E07C0C8B3DB2FAE8C9C0415BD1B4A03AD7FFA96985AF0CC5E0410F6C5E29A30200EFFF21AB4B01369A3C59B58


        Results from this class
        -----------------------
        CRC-32  = 967E5DDE
        MD5     = 10E4A1D2132CCB5C6759F038CDB6F3C9
        SHA-1   = 42D36EEB2140441B48287B7CD30B38105986D68F
        SHA-256 = C6A91CBA00BF87CDB064C49ADAAC82255CBEC6FDD48FD21F9B3B96ABF019916B    
    */    

    public static void main(String[] args)throws Exception {
        Map<String, String> hashes = getFileHash(calc);
        for (Map.Entry<String, String> entry : hashes.entrySet()) {
            System.out.println(String.format("%-7s = %s", entry.getKey(), entry.getValue()));
        }
    }

    private static Map<String, String> getFileHash(File file) throws NoSuchAlgorithmException, IOException {
        Map<String, String> results = new LinkedHashMap<String, String>();

        if (file != null && file.exists()) {
            CRC32 crc32 = new CRC32();
            MessageDigest md5 = MessageDigest.getInstance("MD5");
            MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
            MessageDigest sha256 = MessageDigest.getInstance("SHA-256");

            FileInputStream fis = new FileInputStream(file);
            byte data[] = new byte[size];
            int len = 0;
            while ((len = fis.read(data)) != -1) {
                crc32.update(data, 0, len);
                md5.update(data, 0, len);
                sha1.update(data, 0, len);
                sha256.update(data, 0, len);
            }
            fis.close();

            results.put("CRC-32", toHex(crc32.getValue()));
            results.put(md5.getAlgorithm(), toHex(md5.digest()));
            results.put(sha1.getAlgorithm(), toHex(sha1.digest()));
            results.put(sha256.getAlgorithm(), toHex(sha256.digest()));
        }
        return results;
    }

    private static String toHex(byte[] bytes) {
        String result = "";
        if (bytes != null) {
            StringBuilder sb = new StringBuilder(bytes.length * 2);
            for (byte element : bytes) {
                if ((element & 0xff) < 0x10) {
                    sb.append("0");
                }
                sb.append(Long.toString(element & 0xff, 16));
            }
            result = sb.toString().toUpperCase();
        }
        return result;
    }

    private static String toHex(long value) {
        return Long.toHexString(value).toUpperCase();
    }

}
Bertilla answered 15/3, 2012 at 20:57 Comment(12)
I guess your toHex is wrong. If you do int newElement = ((int) element) & 0xff and use that instead would that solve your problem?Pinard
@zapl: That wouldn't change anything.Concenter
In parallel to calculating check sum, copy the file to some temp file, so that you can compare what Java gets with what you get when you use other tools. Windows may be weird like that... I never saw Java making a mistake calculating hashes...Melanite
Which external tools did you use? I test it at home with md5sum and sha256sum and all values are equal, for several files (I tested 3 or 4)... I don't have a windows pc, but that shouldn't change since those algorithms are platform independentIcs
All programmers should program like this! The code is very clean and neat.Concenter
I've verified that I get the same result, and that it's not toHex, and that my calc.exe has the same genuine MD5 sum. Weird. Looking into it.Somite
@user567496: for what it is worth your code gives the correct SHA-1 hashes compared to other Java SHA-1 implementation and compared to the commandline sha1sum util... (tested with files on Linux, not with calc.exe)Dollfuss
Take a look at this post #3077696Ics
@Fido: what would a post about using a salt would have to do with a regular sha1 checksum?Dollfuss
@Dollfuss Sorry for not clarifying, what I was trying to point out is that MessageDigest doesn't always give the right value due to misinterpretation of the input stream (in the post's case, because it lacked the proper charset). In this case as others pointed out it might be because the file is in use. (The jist of it all is that is very unlikely that MessageDigest is calculating a wrong sum).Ics
@Fido: in this case it couldn't be a charset issue because OP is reading raw bytes: he's not decoding characters.Dollfuss
The odds of the OP choosing to use a binary file in the Windows folder are staggering. And enlightening. I knew about registry redirection, but had overlooked filesystem redirection. Nice.Scheer
S
240

Got it. The Windows file system is behaving differently depending on the architecture of your process. This article explains it all - in particular:

But what about 32-bit applications that have the system path hard coded and is running in a 64-bit Windows? How can they find the new SysWOW64 folder without changes in the program code, you might think. The answer is that the emulator redirects calls to System32 folder to the SysWOW64 folder transparently so even if the folder is hard coded to the System32 folder (like C:\Windows\System32), the emulator will make sure that the SysWOW64 folder is used instead. So same source code, that uses the System32 folder, can be compiled to both 32-bit and 64-bit program code without any changes.

Try copying calc.exe to somewhere else... then run the same tools again. You'll get the same results as Java. Something about the Windows file system is giving different data to the tools than it's giving to Java... I'm sure it's something to do with it being in the Windows directory, and thus probably handled "differently".

Furthermore, I've reproduced it in C#... and found out that it depends on the architecture of the process you're running. So here's a sample program:

using System;
using System.IO;
using System.Security.Cryptography;

class Test
{
    static void Main()
    {
        using (var md5 = MD5.Create())
        {
            string path = "c:/Windows/System32/Calc.exe";
            var bytes = md5.ComputeHash(File.ReadAllBytes(path));
            Console.WriteLine(BitConverter.ToString(bytes));
        }
    }
}

And here's a console session (minus chatter from the compiler):

c:\users\jon\Test>csc /platform:x86 Test.cs    

c:\users\jon\Test>test
60-B7-C0-FE-AD-45-F2-06-6E-5B-80-5A-91-F4-F0-FC

c:\users\jon\Test>csc /platform:x64 Test.cs

c:\users\jon\Test>test
10-E4-A1-D2-13-2C-CB-5C-67-59-F0-38-CD-B6-F3-C9
Somite answered 15/3, 2012 at 21:17 Comment(10)
@TacticalCoder: Yup, looks like it.Somite
There are two versions of calc.exe: 64bit in C:\Windows\system32` and 32bit in C:\Windows\SysWOW64`. For compatibility in a 32bit process C:\Windows\system32` is mapped to C:\Windows\SysWOW64`. 64bit processes will launch the 64bit calc, 32bit processes the 32bit calc. Not surprising their checksums are different. If you hold the file open and look with handles.exe or Process Explorer you'll see the different path.Rebel
Don't forget you get the same thing with the registry too.Margie
@Jon That something is known as the File System Redirector.Punctilious
@sehe What on earth are you talking about?! The registry and file system redirectors are fabulous. They make WOW64 viable.Punctilious
@DavidHeffernan Opinions vary, perhaps along with the definition of 'viable'. All this virtualization does violate the principle of least surprise and adds costs (allocation and runtime). Other operating systems manage to provide both better 32-on-64 support and better application virtualization with fewer snags/leaky abstractions (try running garbage collecting programs on Wow64, or try comparing md5 sums like the OP, and a few other niche cases).Hyams
A vaguely related Raymond Chen post about duplicated utilities (the comments go into the syswow64 stuff: blogs.msdn.com/b/oldnewthing/archive/2006/03/28/… )Appointment
Sometimes I wonder if people upvote you because you are jon skeet, not solely because of the answer. I'm not saying the answer isnt good or anything, but 145 upvotes when the answer is "Something is happening in windows" (to be fair you do provide a link, but still) seems like people are considering more then just your answer when they upvote. I'm not hating on you, but this just means its going to be a while before I catch up to you :PBaranowski
@ExitMusic: If it's any consolation, it won't have provided much rep due to the. But the reason this has had so much visibility is that I tweeted and blogged about it, as an interesting problem.Somite
The blog is how I found it. I was hoping for some Jon Skeet magic but I felt like "Hey, I could have done that". Probably not nearly as quickly but there you go. Ok maybe I couldn't have, but still. As for the cap, there is little consolation in it because that just means that any given day you will reach it, and I can therefore never catch up to you. Oh well...Baranowski

© 2022 - 2024 — McMap. All rights reserved.