Issues with compression in javascript
Asked Answered
D

2

9

I have an object I am trying to compress. It is of the form

[
  {
    array
    string
  },
  {
    array
    string
  },
  ...
]

The arrays are no more than 10-15 in length, extremely small in comparison to the strings (they are html, roughly 170k in length). The strings though are usually repeated, or have huge amounts of overlap. So my intuition tells me the compressed value should be the compress value of 1 string, plus a little extra.

I JSON.stringify this object and try to compress.

Most compression libraries did a bad job of compressing the strings, since the server sends me a gzip compressed version of 77kb, I know it can be at least this small.

gzip-js

lzma-js

Did a good job out of the maybe 15 libraries I tried.

The issue is gzip-js is linear in the number of strings. But lzma does this correctly, where it only increases in size slightly.

Lzma-js(level 2) is very slow unfortunately(20s vs 1s gzip) when compressing 7mbs(about 30~ strings).

Is there a compressopn library out there, that is roughly as quick as gzip, but doesn't scale linearly on repeat strings?

Depurate answered 3/7, 2015 at 17:2 Comment(4)
Can you list the ones that did a bad job that you tried? It will sure save the reset of us time to not have to do the same work you already went through.Verify
Did you look at the answers to #4570833? The top-rated answer links to this page pieroxy.net/blog/pages/lz-string/index.html, which references a few lz compression libBuckhound
If you have some time you can convert bits to image (every 3 bit as a bixel) and save it as a png lossless image, best performance with very good compressUttasta
github.com/tcorral/JSONCBarreto
D
2

Pako was usefull for me, give it a try:

Instead of using string ids use byteArrays, like it is done here.

Get pako.js and you can decompress byteArray like so:

<html>
<head>
<title>Gunzipping binary gzipped string</title>
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">

// Get datastream as Array, for example:
var charData    = [31,139,8,0,0,0,0,0,0,3,5,193,219,13,0,16,16,4,192,86,214,151,102,52,33,110,35,66,108,226,60,218,55,147,164,238,24,173,19,143,241,18,85,27,58,203,57,46,29,25,198,34,163,193,247,106,179,134,15,50,167,173,148,48,0,0,0];

// Turn number array into byte-array
var binData     = new Uint8Array(charData);

// Pako magic
var data        = pako.inflate(binData);

// Convert gunzipped byteArray back to ascii string:
var strData     = String.fromCharCode.apply(null, new Uint16Array(data));

// Output to console
console.log(strData);

</script>
</head>
<body>
Open up the developer console.
</body>
</html>

Running example: http://jsfiddle.net/9yH7M/

Alternatively you can base64 encode the array before you send it over as the Array takes up a lot of overhead when sending as JSON or XML. Decode likewise:

// Get some base64 encoded binary data from the server. Imagine we got this:
var b64Data     = 'H4sIAAAAAAAAAwXB2w0AEBAEwFbWl2Y0IW4jQmziPNo3k6TuGK0Tj/ESVRs6yzkuHRnGIqPB92qzhg8yp62UMAAAAA==';

// Decode base64 (convert ascii to binary)
var strData     = atob(b64Data);

// Convert binary string to character-number array
var charData    = strData.split('').map(function(x){return x.charCodeAt(0);});

// Turn number array into byte-array
var binData     = new Uint8Array(charData);

// Pako magic
var data        = pako.inflate(binData);

// Convert gunzipped byteArray back to ascii string:
var strData     = String.fromCharCode.apply(null, new Uint16Array(data));

// Output to console
console.log(strData);

Running example: http://jsfiddle.net/9yH7M/1/

For more advanced features, read the pako API documentation.

Dissuasive answered 14/7, 2015 at 8:26 Comment(0)
U
1

Use the gzip-js lib with high compress level
https://github.com/beatgammit/gzip-js

var gzip = require('gzip-js'),
    options = {
        level: 9,
        name: 'hello-world.txt',
        timestamp: parseInt(Date.now() / 1000, 10)
    };

// out will be a JavaScript Array of bytes
var out = gzip.zip('Hello world', options);

I found this way as minimum as posible size with normal duration

And for LZ-based compression algorithm i think lz-string is faster
check this on your data sample
https://github.com/pieroxy/lz-string

Uttasta answered 13/7, 2015 at 22:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.