How to read a binary file with FileReader in order to hash it with SHA-256 in CryptoJS?
Asked Answered
M

1

7

how do I convert a UTF-8 string to Latin1 encoded string using javascript?

Here is what I am trying to do:

  1. I get a file, split that in chunks by reading as arraybuffer
  2. then, I parse the arraybuffer as string
  3. and passing it to cryptoJS for hash computation using following code:

    cryptosha256 = CryptoJS.algo.SHA256.create();
    cryptosha256.update(text);
    hash = cryptosha256.finalize();
    

It all works well for a text file. I get problems when using the code for hashing a non-text files (image/.wmv files). I saw in another blog and there the CryptoJS author requires the bytes to be sent using Latin1 format instead of UTF-8 and that's where I am stuck.

Not sure, how can I generate the bytes (or strings) using Latin1 format from arraybuffer in javascript?

$('#btnHash').click(function () {
    var fr = new FileReader(), 
        file = document.getElementById("fileName").files[0];
    fr.onload = function (e) {
        calcHash(e.target.result, file);
    };
    fr.readAsArrayBuffer(file);
});
function calcHash(dataArray, file) {
    cryptosha256 = CryptoJS.algo.SHA256.create();
    text = CryptoJS.enc.Latin1.parse(dataArray);
    cryptosha256.update(text);
    hash = cryptosha256.finalize();
}
Marvelous answered 25/11, 2015 at 11:0 Comment(12)
'bytes' are not in Latin1 or any other format. And for binary files like (most) images and sounds, character encoding doesn't really apply. If you convert text from one encoding to another, you just have text in another encoding (with possibly the loss of some characters). If you convert a binary file to another text encoding, you will most likely have a corrupt file.Superclass
I'm pretty sure that CryptoJS does directly take an arraybuffer. No need to care about text encodings.Disposure
thanks GolezTrol... here is what crypto author writes: "When you pass a string to a hasher, it's converted to bytes using UTF-8. That's to ensure foreign characters are not clipped. Since you're working with binary data, you'll want to convert the string to bytes using Latin1." sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result));Marvelous
the link for above statement: code.google.com/p/crypto-js/issues/…Marvelous
when I tried using the crypto method sha256.update(CryptoJS.enc.Latin1.parse(evt.target.result)); It returned 'undefined' as hash value :(Marvelous
Are you sure evt.target.result contains the correct value? Please update your question with the whole code snippet.Flosi
@Disposure No, CryptoJS doesn't work on an ArrayBuffer. It has an internal binary format that stores the data in an array of words (32 bit ints). It would be necessary to convert an ArrayBuffer to WordArrayDrier
@Flosi .... updated the detailed code in original question.Marvelous
@ArtjomB. for me, it's not working for small images either(I tried with 200KB png file as well).Marvelous
@ArtjomB. I tried using readAsBinaryString too. That still gets me undefined as `hash' value...Marvelous
Please don't post a solution to your question. I rolled back your edit. You can add an additional answer to your question.Drier
Just came to post that after spending hours debugging this online the solution in the comment above, using CryptoJS.enc.Latin1.parse(evt.target.result) to get the proper SHA1 hash finally worked for me. It seems when reading binary data, the Latin1 parsing is needed.Groyne
D
19

CryptoJS doesn't understand what an ArrayBuffer is and if you use some text encoding like Latin1 or UTF-8, you will inevitably lose some bytes. Not every possible byte value has a valid encoding in one of those text encodings.

You will have to convert the ArrayBuffer to CryptoJS' internal WordArray which holds the bytes as an array of words (32 bit integers). We can view the ArrayBuffer as an array of unsigned 8 bit integers and put them together to build the WordArray (see arrayBufferToWordArray).

The following code shows a full example:

function arrayBufferToWordArray(ab) {
  var i8a = new Uint8Array(ab);
  var a = [];
  for (var i = 0; i < i8a.length; i += 4) {
    a.push(i8a[i] << 24 | i8a[i + 1] << 16 | i8a[i + 2] << 8 | i8a[i + 3]);
  }
  return CryptoJS.lib.WordArray.create(a, i8a.length);
}

function handleFileSelect(evt) {
  var files = evt.target.files; // FileList object

  // Loop through the FileList and render image files as thumbnails.
  for (var i = 0, f; f = files[i]; i++) {
    var reader = new FileReader();

    // Closure to capture the file information.
    reader.onloadend = (function(theFile) {
      return function(e) {
        var arrayBuffer = e.target.result;

        var hash = CryptoJS.SHA256(arrayBufferToWordArray(arrayBuffer));
        var elem = document.getElementById("hashValue");
        elem.value = hash;
      };

    })(f);
    reader.onerror = function(e) {
      console.error(e);
    };

    // Read in the image file as a data URL.
    reader.readAsArrayBuffer(f);
  }
}

document.getElementById('upload').addEventListener('change', handleFileSelect, false);
<script src="https://cdn.rawgit.com/CryptoStore/crypto-js/3.1.2/build/rollups/sha256.js"></script>
<form method="post" enctype="multipart/form-data">
  Select image to upload:
  <input type="file" name="upload" id="upload">
  <input type="text" name="hashValue" id="hashValue">
</form>

You can extend this code with the techniques in my other answer in order to hash files of arbitrary size without freezing the browser.

Drier answered 25/11, 2015 at 14:6 Comment(8)
thanks! Your answer gave me more than I had expected. just to let you know, the 'undefined' hash value that I was getting was due to weird reason of having two other references of <script src="http://crypto-js.googlecode.com/svn/tags/3.0.2/build/rollups/hmac-sha256.js"></script> <script src="http://crypto-js.googlecode.com/svn/tags/3.0.2/build/components/enc-base64-min.js"></script> I took them out, and undefined issue was gone. thanks guys!Marvelous
arrayBufferToWordArray did the magic for me. thanks so much!Marvelous
slap a .toString() on the var hash to get an actual string hash output!Riflery
@MichealCWallas Do you mean that line elem.value = hash; should be changed to elem.value = hash.toString();? That shouldn't hurt, but it also shouldn't be necessary, because assigning a WordArray object to a string property should result in automatic stringification. I've tested this with Firefox and Vivaldi and didn't see an issue. Maybe this is a bug in the browser you're using.Drier
@ArtjomB. sorry I meant on the final output, var hash = CryptoJS.SHA256(arrayBufferToWordArray(arrayBuffer)).toString(). I thought the code wasn't working but the toString() was all it took to get a string hash output :)Riflery
That's the same value that I meant. Did you try to print it like this console.log(hash)? If not which browser did you use?Drier
Big files this code is crashing the browser. can any one help for this?Bolshevik
@Bolshevik Did you read my answer to the end? There is a link to a link to probably working code.Drier

© 2022 - 2024 — McMap. All rights reserved.