I'm downloading a zip file with axios. For further processing, I need to get the "raw" data that has been downloaded. As far as I can see, in Javascript there are two types for this: Blobs and Arraybuffers. Both can be specified as responseType
in the request options.
In a next step, the zip file needs to be uncompressed. I've tried two libraries for this: js-zip and adm-zip. Both want the data to be an ArrayBuffer. So far so good, I can convert the blob to a buffer. And after this conversion adm-zip always happily extracts the zip file. However, js-zip complains about a corrupted file, unless the zip has been downloaded with 'arraybuffer'
as the axios responseType
. js-zip does not work on a buffer
that has been taken from a blob
.
This was very confusing to me. I thought both ArrayBuffer
and Blob
are essentially just views on the underlying memory. There might be a difference in performance between downloading something as a blob vs buffer. But the resulting data should be the same, right ?
Well, I decided to experiment and found this:
If you specify responseType: 'blob'
, axios converts the response.data
to a string. Let's say you hash this string and get hashcode A. Then you convert it to a buffer. For this conversion, you need to specify an encoding. Depending on the encoding, you will get a variety of new hashes, let's call them B1, B2, B3, ... When specifying 'utf8' as the encoding, I get back to the original hash A.
So I guess when downloading data as a 'blob'
, axios implicitly converts it to a string encoded with utf8. This seems very reasonable.
Now you specify responseType: 'arraybuffer'
. Axios provides you with a buffer as response.data
. Hash the buffer and you get a hashcode C. This code does not correspond to any code in A, B1, B2, ...
So when downloading data as an 'arraybuffer'
, you get entirely different data?
It now makes sense to me that the unzipping library js-zip complains if the data is downloaded as a 'blob'
. It probably actually is corrupted somehow. But then how is adm-zip able to extract it? And I checked the extracted data, it is correct. This might only be the case for this specific zip archive, but nevertheless surprises me.
Here is the sample code I used for my experiments:
//typescript import syntax, this is executed in nodejs
import axios from 'axios';
import * as crypto from 'crypto';
axios.get(
"http://localhost:5000/folder.zip", //hosted with serve
{ responseType: 'blob' }) // replace this with 'arraybuffer' and response.data will be a buffer
.then((response) => {
console.log(typeof (response.data));
// first hash the response itself
console.log(crypto.createHash('md5').update(response.data).digest('hex'));
// then convert to a buffer and hash again
// replace 'binary' with any valid encoding name
let buffer = Buffer.from(response.data, 'binary');
console.log(crypto.createHash('md5').update(buffer).digest('hex'));
//...
What creates the difference here, and how do I get the 'true' downloaded data?