Embedding binary data in web page?
Asked Answered
A

3

11

I have a data structure with 6000 elements and for each element I need to store 7 bits of info. If I naively store it as an array of 6000 elements filled with numbers, it takes up around 22 KB. I am trying to reduce the size of the page - what is the best way to store 6000*7 bits of info (should be around 5 KB). I want a "bitstream" like data structure. I thought about encoding it into a string or even an image but not exactly sure. The reason I did not encode as string because I cannot mathemtically guarantee that none of the characters won't be one of unprintable ASCII characters (e.g. ASCII 1-25)

Aldridge answered 21/6, 2013 at 5:4 Comment(10)
String with bits shifted to some range that does not include 0 is probably easiest solution (something like charFromCode[value +32] to get all values in safe range) - easy access to each element...Counsellor
May see thisKurtiskurtosis
@Kurtiskurtosis - I actually tried the library pointed at that page (pieroxy.net/blog/pages/lz-string/index.html) and funnily it is a really bad compression library (the author himself admits, it increases the size of the output if you have more than 750 characters which in my case is true). That solution is good for compressing small strings to be stored in localstorage. Any other client side compression scheme might work but I am looking for something purely bithacks based.Aldridge
Why do you want to reduce it? To save network bandwidth or memory footprint in browser, or for storage such as localStorage?Fillian
@rib network bandwidth - this 17KB if I manage to shave off will reduce my page size (including all the css and scripts it loads) by 30%Aldridge
Do you have server compression enabled? I would have thought gzip would have conpressed a naive array pretty well? Is the 22k the observed network effect, or just the increase in server file size?Fillian
@rib - I was watching the network panel in the Chrome debuggerAldridge
Great! But did the response header part say that the response was 'gzip' compressed? Sorry to labour the point, but methods of trying to put data on the page are likely to increase raw uncompressed size, although the image idea might be good. (will put full answer in time)Fillian
let us continue this discussion in chatFillian
do you have enough "runs" of equal characters such that run-length encoding would be useful?Header
R
9

Let's consider two solutions.

Base 32

For fun, let's consider using base-32 numbers. Yes, you can do that in JavaScript.

First pack four 7-bit values into one integer:

function pack(a1,a2,a3,a4){
    return ((a1 << 8 | a2) << 8 | a3) << 8 | a4;
}

Now, convert to base 32.

function encode(n){
    var str = "000000" + n.toString(32);
    str = str.slice(0,6);
    return str;
}

That should be not more than six digits. We make sure it's exactly six.

Going the other direction:

function decode(s){
    return parseInt(s, 32);
}

function unpack(x){
    var a1 = x & 0xff0000>>24, a2 = x & 0x00ff0000>>16, a3 = x & 0x0000ff00>>8, a4 = x & 0x000000ff;
    return [a1, a2, a3, a4];
}

All that remains is to wrap the logic around this to handle the 6000 elements. To compress:

function compress(elts){
    var str = '';
    for(var i = 0; i < elts.length; i+=4){
        str += encode(pack(elts[i], elts[i+1], elts[i+2], elts[i+3])
    }
    return str;
}

And to uncompress:

function uncompress(str){
    var elts = [];
    for(var i = 0; i < str.length; i+=6){
        elts = elts.concat(unpack(decode(str.slice(i, i+6)));
    }
    return elts;
}

If you concatenate the results for all 6,000 elements, you'll have 1,500 packed numbers, which at six characters each will turn into about 9K. It's about 1.5 bytes per 7-bit value. It's by no means the information-theoretic maximum compression, but it's not that bad. To decode simply reverse the process:

Unicode

First we'll pack two 7-bit values into one integer:

function pack(a1,a2){
    return (a1 << 8 | a2) << 8;
}

We'll do this for all 6,000 inputs, then use our friend String.fromCharCode to turn all 3,000 values into a 3,000-character Unicode string:

function compress(elts){
    var packeds = [];
    for (var i = 0; i < elts.length; i+=2) {
        packeds.push(pack(elts[i], elts[i+1]);
    }
    return String.fromCharCode.apply(0, packeds);
}

Coming back the other way, it's quite easy:

function uncompress(str) {
    var elts = [], code;
    for (var i = 0; i < str.length; i++) {
        code=str.charCodeAt(i);
        elts.push(code>>8, code & 0xff);
    }
    return elts;
}

This will take up two bytes per two 7-bit values, so about 33% more efficient than the base 32 approach.

If the above string is going to be written out into a script tag as a Javascript assignment such as var data="HUGE UNICODE STRING";, then quotation marks in string will need to be escaped:

javascript_assignment = 'var data = "' + compress(elts).replace(/"/g,'\\"') + '";';

The above code is not meant to be production, and in particular does not handle edge cases where the the number of inputs is not a multiple of four or two.

Rockyrococo answered 24/6, 2013 at 22:54 Comment(1)
brilliant; this would be a nice example to share with people if you could modify the question into a use case that people could relate to.Charisecharisma
R
1

actually, strings work fine if you use JSON to encode any potential nasties into a JS-escape code:

var codes=",Ñkqëgdß\u001f", // (10 chars JSON encoded to store all chars ranges)
mySet=codes[4].charCodeAt().toString(2).split("").map(Number).map(Boolean).reverse();

alert(mySet); // shows: [true,false,false,false,true,true,true] 


/*  broken down into bite-sized steps: (pseudo code)
char == "g" (codes[4])
"g".charCodeAt() == 103
(103).toString(2) == "1100111"
.split().map(Number) ==  [1,1,0,0,1,1,1]
.map(Boolean).reverse() == [true,true,true,false,false,true,true]  */

and to fill the array, reverse the process:

var toStore= [true, false, true, false, true, false, true];
var char= String.fromCharCode(parseInt(toStore.map(Number).reverse().join(""),2));
codes+=char;

//verify (should===true):   
codes[10].charCodeAt().toString(2).split("")
   .map(Number).map(Boolean).reverse().toString() === toStore.toString();

to export the results to an ascii file, JSON.stringify(codes), or if saving to localStrorage, you can just save the raw string variable since browsers use two bytes per char of localStorage...

Roi answered 24/6, 2013 at 21:7 Comment(0)
C
1

As dandavis said it is ok to encode unprintable ASCII characters into JSON-string. But for random data it gave me 13KB (because many characters must be escaped). You can encode string into base64 and then into JSON-string. It gave me 7.9KB for random data.

var randint = function (from, to) {
    return Math.floor(Math.random() * (to - from + 1)) + from;
}

var data = '';
for (var i = 0; i < 6000; ++i) {
    data += String.fromCharCode(randint(0, 127));
}
// encoding `data` as JSON-string at this point gave me 13KB

var b64data = btoa(data);
// encoding `b64data` as JSON-string gave me 7.9KB

to decode it

var data = atob(b64data);
var adata = [];
for (var i = 0; i < data.length; ++i) {
    adata.push(data.charCodeAt(i));
}

There definitely should be more efficient method to encode your data, but I believe this one is a compromise on complexity and efficiency. PS. In some browsers you might need to write atob and btoa by yourself.

Custom answered 24/6, 2013 at 22:11 Comment(1)
To elaborate on the "In some browsers you might need to write atob and btoa by yourself" part: atob and btoa requires IE 11, Edge 16, Firefox 52, Chrome 49, Safari 10.1 (or 9.3 on iOS) or Opera 45 (or Mini). It will not be there by default in earlier browsers, nor if you're running on Node.JS, nor in a WebView on an Android 4 app (those WebView components do not get fixed by Chrome updates).Okie

© 2022 - 2024 — McMap. All rights reserved.