Most efficient way (time and space wise) to send binary data in response

My setup is a Flask-based server. A bird-view of the project would be: the Flask-based server fetches binary data from AWS S3 based on some algorithmic calculations (like figuring out the filenames to fetch from S3), and serves the data to an HTML+JavaScript client.

At first, I thought a JSON object to be the best response type. I created a JSON response with following (possibly syntactically incorrect) format:

{
  'payload': [
    {
      'symbol': 'sym',
      'exchange': 'exch',
      'headerfile': {
        'name': '#name',
        'content': '#binarycontent'
      },
      'datafiles': [
        {
          'name': '#name',
          'content': '#binarycontent'
        },
        {
          'name': '#name',
          'content': '#binarycontent'
        }
      ]
    },
    'errors': [ //errors ]
}

I apologise for any syntactical errors in the JSON; I am a bit sleepy to find out a minor error. After structuring this JSON, I came to know that JSON doesn't natively support binary data in it. So, I wouldn't be able to embed the binary data as values in JSON.

I realize that I can always convert the bytes into base64-encoded string, and use the string as value in JSON. But, a resultant string is around 30% extra in size; 4010 bytes of data was encoded into 5348 bytes, which while insignificant for a single binary chunk, is seen as a concern by my client when it comes to embedding a lot of such binary chunks in a JSON response. Due to the extra size, response would take more time to reach the client, which is a crucial concern for my client's application.

Another option I considered was to stream the binary chunks as octet-stream Content-Type to the client. But I am not sure if its any better than the above solution. Futhermore, I haven't been able to figure out how to relate the binary chunks and their names in such a situation.

Is there a solution better than 'convert binary to text and embed into JSON'?

>>> import bson warning: module typecheck.py cannot be imported, type checking is skipped >>> encoded = bson.serialize_to_bytes({'name': 'chunkfile', 'content': b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13'}) >>> print(encoded) b'1\x00\x00\x00\x02name\x00\n\x00\x00\x00chunkfile\x00\x05content\x00\n\x00\x00\x00\x00\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13\x00' >>> decoded = bson.parse_bytes(encoded) >>> print(decoded) OrderedDict([('name', 'chunkfile'), ('content', b'\xad\x03\xae\x03\xac\x03\xac\x03\xd4\x13')])

Recommended topics

Hot tags