Is it worth the effort to try to reduce JSON size?
Asked Answered
S

5

31

I am submitting relatively lots of data from a mobile application (up to 1000 JSON objects), that I would normally encode like this:

[{
    id: 12,
    score: 34,
    interval: 5678,
    sub: 9012
}, {
    id: ...
}, ...]

I could make the payload smaller by submitting an array of arrays instead:

[[12, 34, 5678, 9012], [...], ...]

to save some space on the property names, and recreate the objects on the server (as the schema is fixed, or at least it is a contract between the server and the client).

The payload in then submitted in a POST request, most likely over a 3G connection (or could be wifi).

It looks like I am saving some bandwidth by using nested arrays, but I'm not sure it is noticeable when gzip is applied, and I'm not sure how to precisely and objectively measure the difference.

On the other hand, the nested arrays don't feel like a good idea: they are less readable and thus harder to spot errors while debugging. Also, since we're flushing readability down the toilet, we could just flatten the array, since each child array has a fixed number of elements, the server could just slice it up and reconstruct the objects again.

Any further reading material on this topic is much appreciated.

Stetson answered 22/6, 2012 at 17:8 Comment(2)
Which programming language do you use on the server and client? Why I ask: Since you're considering dropping JSON, you might as well use a library that abstracts the serialization/de-serialization and all you need to care about are native programming language objects.Carricarriage
@Carricarriage Python (might be ported to Go in the future), on App Engine.Stetson
R
25

JSONH, aka hpack, https://github.com/WebReflection/JSONH does something very similar to your example:

[{
    id: 12,
    score: 34,
    interval: 5678,
    sub: 9012
}, {
    id: 98,
    score: 76,
    interval: 5432,
    sub: 1098
}, ...]

Would turn into:

[["id","score","interval","sub"],12,34,5678,9012,98,76,5432,1098,...]
Radioactivity answered 3/1, 2013 at 0:38 Comment(1)
Very nice. And there already is both a Python and a JavaScript implementation ready to use. This seems to be much easier to plug in than ProtoBuf.Stetson
P
21

JSON is meant for readability. You could have an intermediate format if you're concerned about space. Create a serialize/deserialize function which takes a JSON file and creates a compressed binary storing your data as compactly as is reasonable, then read that format on the other end of the line.

See: http://en.wikipedia.org/wiki/Json First sentence: "JSON...is a lightweight text-based open standard designed for human-readable data interchange."

Essentially, my point is that humans would always see the JSON, and machines would primarily see the binary. You get the best of both worlds: readability and small data transfer (at the cost of a tiny amount of computation).

Perilous answered 22/6, 2012 at 17:13 Comment(2)
Thank you for your answer, I'm going with easily readable JSON objects for now, as you suggest, considering @usr's suggestions on gzip — until I have a spare weekend to imptegrate ProtoBuf in our API, as @ArjunShankar mentions.Stetson
Agreed, just use JSON, but compress it if you want to reduce the size over the wire or when you persist it. I did a pretty comprehensive comparison of the alternatives, and that was the conclusion we came to. Read the full details here: lucidchart.com/techblog/2019/12/06/…Runway
D
15

Gzip will replace the recurring parts of your message with small back-references to their first occurence. The algorithm is pretty "dumb" but for this kind of repetitive data it is great. I think you won't see noticeable decreases in over-the-wire size because your object "structure" is sent only once.

You can roughly test this by zipping two sample JSONs. Or by capturing an HTTP-request using Fiddler. It can show the compressed and uncompressed sizes.

Dichasium answered 22/6, 2012 at 17:18 Comment(1)
GZIPed JSON vs GZIPed JSONH is very similar.Tarshatarshish
T
10

Although is an old question, I'd like to put some words.

In my experience, large differences in json raw size, amount very little after compression. I prefer to keep it human readable.

In real case numbers: a json file of 1,29MB, and the optimized version of 145KB, when compressed, where of 32KB and 9KB.

Except in extreme conditions, I think this kind of differences are negligibles and the cost in readability is huge.

A:

{
  "Code": "FCEB97B6",
  "Date": "\/Date(1437706800000)\/",
  "TotalQuantity": 1,
  "Items": [
    {
      "CapsulesQuantity": 0,
      "Quantity": 1,
      "CurrentItem": {
        "ItemId": "SHIELD_AXA",
        "Order": 30,
        "GroupId": "G_MODS",
        "TypeId": "T_SHIELDS",
        "Level": 0,
        "Rarity": "R4",
        "UniqueId": null,
        "Name": "AXA Shield"
      }
    }
  ],
  "FormattedDate": "2015-Jul.-24"
}

B:

{
  "fDate": "2016-Mar.-01",
  "totCaps": 9,
  "totIts": 14,
  "rDays": 1,
  "avg": "1,56",
  "cells": {
    "00": {
      "30": 1
    },
    "03": {
      "30": 1
    },
    "08": {
      "25": 1
    },
    "09": {
      "26": 3
    },
    "12": {
      "39": 1
    },
    "14": {
      "33": 1
    },
    "17": {
      "40": 3
    },
    "19": {
      "41": 2
    },
    "20": {
      "41": 1
    }
  }
}

This are fragments of the two files.

Teagan answered 9/4, 2016 at 2:42 Comment(1)
Thank you, Agent Menta, for your valuable addition to this question ;)Stetson
C
8

Since you're using this on a mobile device (you mention 3G), you might actually want to care about size, not readability. Moreover, do you frequently expect to read what is being transmitted over the wire?

This is a suggestion for an alternate form.

ProtoBuf is one option. Google uses it internally, and there is a ProtoBuf 'compiler' which can read .proto files (containing a message description) and generate Java/C++/Python serializers/deserializers, which use a binary form for transmission over the wire. You simply use the generated classes on both ends, and forget about what the object looks like when transmitted over the wire. There is also an Obj-C port maintained externally.

Here is a comparison of ProtoBuf against XML, on the ProtoBuf website (I know XML is not what you use, still).

Finally, here is a Python tutorial.

Carricarriage answered 22/6, 2012 at 17:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.