Binary Data in JSON String. Something better than Base64

Asked 18/9, 2009 at 8:8 Answered 12/5, 2023 at 6:23

774

The JSON format natively doesn't support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in double quotes using backslash escapes) in JSON.

An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters which leads to an increased data size by around 33%.

One use case for this is the v0.8 draft of the CDMI cloud storage API specification. You create data objects via a REST-Webservice using JSON, e.g.

PUT /MyContainer/BinaryObject HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
    "mimetype" : "application/octet-stream",
    "metadata" : [ ],
    "value" :   "TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
    IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
    dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
    dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
    ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=",
}

Are there better ways and standard methods to encode binary data into JSON strings?

Catabolite answered 18/9, 2009 at 8:8 Comment(10)

For upload: you're only doing it once, so it's not as big a deal. For download, you might be surprised how well base64 compresses under gzip, so if you have gzip enabled on your server you're also probably OK. – Gnomic 29/1, 2014 at 17:32

Another worthy solution msgpack.org for the hardcore nerds: github.com/msgpack/msgpack/blob/master/spec.md – Isabellaisabelle 16/10, 2014 at 9:51

@cloudfeet, Once per user per action. Very big a deal. – Discus 20/3, 2017 at 8:33

Note that characters are typically 2 bytes of memory each. Thus, base64 might give +33% (4/3) overhead on the wire, but putting that data on the wire, retrieving it, and utilizing it, would require a +166% (8/3) overhead. Case in point: if a Javascript string has a maximum length of 100k chars, you can only represent 37.5k bytes of data using base64, not 75k bytes of data. These numbers may be a bottleneck in many parts of the application, e.g. JSON.parse etc. ...... – Discus 20/3, 2017 at 11:59

....... Contrast these numbers to the savings you can gain if you convert the raw binary data into codepoints, and then convert those codepoints into UTF-8. Even a simple conversion using the encoding for codepoints 0x00 to 0xff would average out to an overhead of only +50%. and ................ – Discus 20/3, 2017 at 12:0

................. converting using the encoding for codepoints 0x00 to 0xffff averages to an overhead of ~48.5%. Converting using the encoding for codepoints 0x00 to 0x10ffff averages to overhead of 39.8%. This is 71.5k bytes of data you can represent with 100k chars instead of base64's 37.5k bytes. – Discus 20/3, 2017 at 12:0

@Discus "typically 2 bytes of memory [per character]" is not accurate. v8 for example has OneByte and TwoByte strings. Two-byte strings are only used where necessary to avoid grotesque memory consumption. Base64 is encodable with one-byte strings. – Sadiron 16/2, 2018 at 0:24

@Gnomic it shouldn't be that surprising how well base64 compresses, especially if the original data compresses well to start with. – Bonita 18/9, 2022 at 7:12

Have you thought about ubjson instead? It's a nice format and interoperable with json. Your binary data will be encoded as an array of integers when converted to json. It has a standard mimetype as well and will give space savings on your other data to boot. – Salpiglossis 17/3, 2023 at 20:14

Folks are over thinking this. Its all binary no matter what. The problem with binary data within JSON encoded strings is that some bytes may equate to special characters that are reserved for JSON. The solution is to check each binary byte as if char and escape as if a special chars, then unescape on the other end and treat as binary. – Phenyl 6/12, 2023 at 23:7

563

There are 94 Unicode characters which can be represented as one byte according to the JSON spec (if your JSON is transmitted as UTF-8). With that in mind, I think the best you can do space-wise is base85 which represents four bytes as five characters. However, this is only a 7% improvement over base64, it's more expensive to compute, and implementations are less common than for base64 so it's probably not a win.

You could also simply map every input byte to the corresponding character in U+0000-U+00FF, then do the minimum encoding required by the JSON standard to pass those characters; the advantage here is that the required decoding is nil beyond builtin functions, but the space efficiency is bad -- a 105% expansion (if all input bytes are equally likely) vs. 25% for base85 or 33% for base64.

Final verdict: base64 wins, in my opinion, on the grounds that it's common, easy, and not bad enough to warrant replacement.

See also: Base91 and Base122

Morningglory answered 18/9, 2009 at 8:33 Comment(15)

wouldn't that be more efficient? I do not understand how base-85 can be the best. – Taxis 12/10, 2012 at 12:7

how is the encoding you talk about in the second paragraph related to the one described here? – Taxis 12/10, 2012 at 12:31

Wait how is just using the actual byte while encoding the quote characters a 105% expansion and base64 only 33%? Isn't base64 133%? – Carapace 22/2, 2013 at 20:56

Base91 is bad idea for JSON, because it contains quote in alphabet. In worst case (all quotes output) after the JSON encoding, it is 245% of the original payload. – Shoestring 3/9, 2013 at 6:5

@jamoh so funny, the problem here it all depends on your needs! it is true that every time you increase the base you get a smaller string to send over the internet, but we should be always aware about the time spend to encode to that base (ex. Base91) and decode from it, in my case I need to send successive screen shot to make a screen sharing app, using Base64, and the canvas is not very fluent. – Atul 11/9, 2013 at 5:32

Python 3.4 includes base64.b85encode() and b85decode() now. A simple encode+decode timing measurement shows that b85 is more than 13 times slower than b64. So we have a 7% size win, but 1300% performance loss. – Rostov 11/9, 2014 at 12:3

According to my calculation there are only 93 characters which are represented as 1 byte in JSON strings: 128 ASCII less 32 control less DEL less 2 reserved gives 128-35 = 93. – Jourdain 21/6, 2015 at 17:17

@Jourdain the JSON standard doesn't require escaping DEL, and DEL is one byte in UTF-8. – Morningglory 25/6, 2015 at 5:38

@Morningglory JSON states that control-characters must be escaped. RFC20 section 5.2 defines DEL to be a control character. – Jourdain 25/6, 2015 at 6:19

@Jourdain ECMA-404 specifically lists the characters that need to be escaped: the double quote U+0022, the backslash U+005C, and "the control characters U+0000 to U+001F". – Morningglory 25/6, 2015 at 7:53

@Jourdain RFC 7159 (the other formalized JSON standard) says char is either a backslash escape or unescaped, and unescaped = %20-21 / %23-5B / %5D-10FFFF — i.e. Unicode minus U+0000 through U+001F, U+0022, and U+005C. – Morningglory 25/6, 2015 at 16:17

@Morningglory That's an interesting point. Now the question is, which one is correct? json.org or RFC7159? As both definitions differ, somebody should look into this and do the repair. So long I stick to json.org, RFC20, Wikipedia and the Python implementation. – Jourdain 26/6, 2015 at 7:48

For Base91 within a JSON string you should replace " by some other free char valid in a JSON string (e.g. ', - or space). – Ortego 30/1, 2017 at 10:24

Repeated the performance test @PieterEnnes described but in 2020, and I get 53x slower – Compartment 1/4, 2020 at 21:23

Could you list those 94 characters? I think it should be these, but I'm not 100% sure: "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz~_!$()+,;@.:=^*?&<>[]{}%#|`/ '-\u007f" (the last one is escaped DEL) – Condescending 10/8, 2020 at 16:42

333

I ran into the same problem, and thought I'd share a solution: multipart/form-data.

By sending a multipart form you send first as string your JSON meta-data, and then separately send as raw binary (image(s), wavs, etc) indexed by the Content-Disposition name.

Here's a nice tutorial on how to do this in obj-c, and here is a blog article that explains how to partition the string data with the form boundary, and separate it from the binary data.

The only change you really need to do is on the server side; you will have to capture your meta-data which should reference the POST'ed binary data appropriately (by using a Content-Disposition boundary).

Granted it requires additional work on the server side, but if you are sending many images or large images, this is worth it. Combine this with gzip compression if you want.

IMHO sending base64 encoded data is a hack; the RFC multipart/form-data was created for issues such as this: sending binary data in combination with text or meta-data.

Shotgun answered 22/1, 2015 at 2:31 Comment(7)

By the way, the Google Drive API is doing it in this way: developers.google.com/drive/v2/reference/files/update#examples – Credible 30/7, 2015 at 13:51

Why is this answer so low down when it uses native features instead of trying to squeeze a round (binary) peg into a square (ASCII) hole?... – Wilma 12/4, 2017 at 20:56

sending base64 encoded data is a hack so is multipart/form-data. Even the blog article you've linked reads that By using the Content-Type multipart/form-data you state, that what you send is actually a form. But it is not. so I think the base64 hack is not only much easier to implement but also more reliable I have seen some libraries (for Python for example), which had multipart/form-data content type hardcoded. – Geter 16/2, 2018 at 8:7

@Geter The multipart/form-data media type was born to transport form data but today it is widely used outside the HTTP/HTML world, notably to encode email content. Today it is proposed as a generic encoding syntax. tools.ietf.org/html/rfc7578 – Rhetoric 28/3, 2018 at 13:53

@MarkKCowan Likely because while this is helpful to the purpose of the question, it doesn't answer the question as asked, which is effectively "Low overhead binary to text encoding for use in JSON", this answer completely ditches JSON. – Susian 2/11, 2018 at 15:29

@ChinotoVokro The question asks if there are any better ways to encode or send binary with JSON, to which my reply is don't encode binary in strings but use multipart instead... @Geter The article I've linked is one of the many sources which describe the usage of multipart for separation of binary/text/json data. Feel free to encode binary into strings if you think it is easier/better/faster. Where size, speed, clarity and stability are important multipart is the better solution and it isn't just for forms: ietf.org/rfc/rfc2388.txt – Consalve 5/11, 2018 at 17:54

The word "with" doesn't even show up in the question, however "in" does. Again, I think multipart is the better choice too in cases where it can be used*, but it doesn't answer the question. *not everything will be an HTTP request/SMTP message – Susian 7/11, 2018 at 16:15

The problem with UTF-8 is that it is not the most space efficient encoding. Also, some random binary byte sequences are invalid UTF-8 encoding. So you can't just interpret a random binary byte sequence as some UTF-8 data because it will be invalid UTF-8 encoding. The benefit of this constrain on the UTF-8 encoding is that it makes it robust and possible to locate multi byte chars start and end whatever byte we start looking at.

As a consequence, if encoding a byte value in the range [0..127] would need only one byte in UTF-8 encoding, encoding a byte value in the range [128..255] would require 2 bytes ! Worse than that. In JSON, control chars, " and \ are not allowed to appear in a string. So the binary data would require some transformation to be properly encoded.

Let see. If we assume uniformly distributed random byte values in our binary data then, on average, half of the bytes would be encoded in one bytes and the other half in two bytes. The UTF-8 encoded binary data would have 150% of the initial size.

Base64 encoding grows only to 133% of the initial size. So Base64 encoding is more efficient.

What about using another Base encoding ? In UTF-8, encoding the 128 ASCII values is the most space efficient. In 8 bits you can store 7 bits. So if we cut the binary data in 7 bit chunks to store them in each byte of an UTF-8 encoded string, the encoded data would grow only to 114% of the initial size. Better than Base64. Unfortunately we can't use this easy trick because JSON doesn't allow some ASCII chars. The 33 control characters of ASCII ( [0..31] and 127) and the " and \ must be excluded. This leaves us only 128-35 = 93 chars.

So in theory we could define a Base93 encoding which would grow the encoded size to 8/log2(93) = 8*log10(2)/log10(93) = 122%. But a Base93 encoding would not be as convenient as a Base64 encoding. Base64 requires to cut the input byte sequence in 6bit chunks for which simple bitwise operation works well. Beside 133% is not much more than 122%.

This is why I came independently to the common conclusion that Base64 is indeed the best choice to encode binary data in JSON. My answer presents a justification for it. I agree it isn't very attractive from the performance point of view, but consider also the benefit of using JSON with it's human readable string representation easy to manipulate in all programming languages.

If performance is critical than a pure binary encoding should be considered as replacement of JSON. But with JSON my conclusion is that Base64 is the best.

Jago answered 22/9, 2013 at 19:21 Comment(13)

What about Base128 but then letting the JSON serializer escape the " and \ ? I think it is reasonable to expect the user to use a json parser implementation. – Bandaranaike 11/11, 2015 at 23:11

@Bandaranaike unfortunately this is not possible because chars with ASCII code below 32 are not allowed in JSON strings. Encodings with a base between 64 and 128 have already been defined, but the required computation is higher than base64. The gain in encoded text size is not worth it. – Jago 13/11, 2015 at 6:54

If loading a large amount of images in base64 (let's say 1000), or loading over a really slow connection, would base85 or base93 ever pay for the reduced network traffic (w/ or w/o gzip)? I'm curious if there comes a point where the more compact data would make a case for one of the alternative methods. – Baobaobab 6/4, 2016 at 19:7

I suspect computation speed is more important than transmission time. Images should obviously be precomputed on the server side. Anyway, the conclusion is that JSON is bad for binary data. – Jago 15/4, 2016 at 8:40

Re "Base64 encoding grows only to 133% of the initial size So Base64 encoding is more efficient", this is completely wrong because characters are typically 2 byte each. See elaboration at #1443658 – Discus 20/3, 2017 at 12:5

@Discus My statement is correct when using UTF8 encoding. So it's not "completely wrong". When 2 bytes are used to store each char, then yes the storage size becomes 260% of the binary size. As you know JSON is used for data storage or transmission, in which case UTF8 encoding is used. In this case, which is the one concerned by the question, my comment is correct and pertinent. – Jago 20/3, 2017 at 12:26

To encode binary into UTF-8, you only need to fix invalid byte sequences, no? You don't need to map >127 charcodes into their exact presentation, just treat the binary data as a utf-8 string and escape FE and FF, and pad truncated sequences somehow? – Response 15/6, 2017 at 16:18

@Response read my answer. Bytes in the range 0 to 30 are not allowed in json strings. Chars " and \ need special care because thye are not treated as normal chars. – Jago 17/6, 2017 at 7:9

@Jago what I mean is that, suppose I want to send the bytes "e298 83e2 9883 e298 830a", this is exactly the UTF-8 for "☃☃☃". So this byte sequence would take 0 overhead to send, even though it has >127 charcodes. JSON has nothing to do with it. The problems then become, how can you escape generic binary data so that it yields valid UTF-8, and how do you decode this in the browser? – Response 18/6, 2017 at 7:22

However, I just did base64 imgfile | gzip | wc -l and that grew the original imagefile by only a few %, so since base64 is so easy and gzip transfers are almost a given, using base64 is indeed a good idea for compressed JSON transfer of compressed data. However, converting uncompressed data to base64 then gzip yields much higher bytecount than gzip+base64+gzip. – Response 18/6, 2017 at 7:38

@Response you are right. If the byte sequence match the encoding constrains of JSON strings there is no need for base64 encoding. But this is not a general solution anymore. – Jago 18/6, 2017 at 7:46

@Response wc -l is counting lines, not bytes. A jpg or png image is already compressed. We don't gain much in recompressing it again. It could be different for other type of binary data. My answer is when not knowing anything about the binary data or binary data resulting from compression or encryption which can be considered random (no frequent repetitive char sequences, presence of bytes in the range 0 to 30, etc.). Outside of this context, your suggestions to get more compact encodings is possible and valid. – Jago 18/6, 2017 at 8:6

This answers can be evolved as described in this blog post with a solution for UTF-8 encoding of 7 bits of data + escaping of the JSON and/or attribute special characters. Thus, this leads to 114% of the original size instead of base64's 133%. – Ekaterina 12/8, 2020 at 13:29

BSON (Binary JSON) may work for you. http://en.wikipedia.org/wiki/BSON

Edit: FYI the .NET library json.net supports reading and writing bson if you are looking for some C# server side love.

Windshield answered 20/9, 2011 at 21:45 Comment(2)

"In some cases, BSON will use more space than JSON due to the length prefixes and explicit array indices." en.wikipedia.org/wiki/BSON – Hilda 29/3, 2017 at 20:18

Good news: BSON natively supports types like Binary, Datetime, and a few others (particularly useful if you are using MongoDB). Bad news: it's encoding is binary bytes... so it is a non-answer to the OP. However it would be useful over a channel that supports binary natively such as RabbitMQ message, ZeroMQ message, or a custom TCP or UDP socket. – Mccahill 1/8, 2019 at 21:20

If you deal with bandwidth problems, try to compress data at the client side first, then base64-it.

Nice example of such magic is at http://jszip.stuartk.co.uk/ and more discussion to this topic is at JavaScript implementation of Gzip

Plains answered 15/3, 2011 at 9:54 Comment(9)

here's a JavaScript zip implementation that claims better performance: zip.js – Taxis 12/10, 2012 at 11:57

Note that you can (and should) still compress after as well (typically via Content-Encoding), as base64 compresses pretty well. – Amberlyamberoid 30/6, 2019 at 19:42

@MahmoudAl-Qudsi you meant that you base64(zip(base64(zip(data))))? I am not sure that adding another zip and then base64 it (to be able to send it as data) is good idea. – Plains 3/7, 2019 at 19:52

@Plains He means enable compression in the web server, which obviously supports binary, so your code does base64(zip(data)) but the client or server does compression on the ASCII before sending it on the (binary) wire, and the other end decompresses before handing it to the receiver code which receives ASCII and just does unzip(decode64(received)) – Subatomic 16/9, 2020 at 5:46

@Subatomic AFAIK the server-side compression compress the server output only – Plains 17/9, 2020 at 15:28

@Plains The server and client both need to support the same compression algorithms - at least, one needs decompression. No point in it otherwise. – Subatomic 18/9, 2020 at 20:10

@Subatomic I am sorry but I am not able to find any evidence of declarative data compression that can be enabled for client's data being sent to server. The developer.mozilla.org/en-US/docs/Web/HTTP/Compression talks about server-to-client compression too. – Plains 20/9, 2020 at 18:59

@Plains #44308271 – Subatomic 22/9, 2020 at 11:37

@Subatomic the linked solution said you have to send gzip data back to server and the mod_deflate will decompress data at the server side. So far so good, but this can't be called transparent sending the gzipped data back to server by any means. You have to compress data at the client on your own (using browser's javascript) and specially craft the header. Fragile and unpractical solution. – Plains 23/9, 2020 at 21:8

yEnc might work for you:

http://en.wikipedia.org/wiki/Yenc

"yEnc is a binary-to-text encoding scheme for transferring binary files in [text]. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit Extended ASCII encoding method. yEnc's overhead is often (if each byte value appears approximately with the same frequency on average) as little as 1–2%, compared to 33%–40% overhead for 6-bit encoding methods like uuencode and Base64. ... By 2003 yEnc became the de facto standard encoding system for binary files on Usenet."

However, yEnc is an 8-bit encoding, so storing it in a JSON string has the same problems as storing the original binary data — doing it the naïve way means about a 100% expansion, which is worse than base64.

Brachypterous answered 18/9, 2009 at 8:12 Comment(5)

Since a lot of people seem to still be viewing this question, I'd like to mention that I don't think yEnc really helps here. yEnc is an 8-bit encoding, so storing it in a JSON string has the same problems as storing the original binary data — doing it the naïve way means about a 100% expansion, which is worse than base64. – Morningglory 17/6, 2011 at 16:44

In cases when using encodings like yEnc with large alphabets with JSON data is considered acceptable, escapeless may work as a good alternative providing fixed known-in-advance overhead. – Tgroup 3/6, 2019 at 18:4

@Morningglory How does storing 8 bit bytes into an 8 bit encoding result in 100% overhead? – Leitmotif 3/1, 2023 at 1:49

@Leitmotif because you can't legally embed that "8 bit encoding" directly into JSON; you have to encode it as UTF-8 characters somehow first. – Morningglory 3/1, 2023 at 5:22

@Morningglory the point of yenc is to explicitly not do that. It's in the name. also json doesn't mandate utf8. there is an rfc that does but json.org doesn't and the spec it links to ecma404 doesn't either. All it says it has to be some kind of unicode. I think you could have valid json with strings that are "utf-21" direct binary blobs except for escaping the end quote. You would just need to find or make parsers to work with it but that's what you get when you use something like yenc. It would be on the same level as having multiple keys with the same name. technically right but mostly unsupported – Leitmotif 9/11, 2023 at 19:10

While it is true that base64 has ~33% expansion rate, it is not necessarily true that processing overhead is significantly more than this: it really depends on JSON library/toolkit you are using. Encoding and decoding are simple straight-forward operations, and they can even be optimized wrt character encoding (as JSON only supports UTF-8/16/32) -- base64 characters are always single-byte for JSON String entries. For example on Java platform there are libraries that can do the job rather efficiently, so that overhead is mostly due to expanded size.

I agree with two earlier answers:

base64 is simple, commonly used standard, so it is unlikely to find something better specifically to use with JSON (base-85 is used by postscript etc; but benefits are at best marginal when you think about it)
compression before encoding (and after decoding) may make lots of sense, depending on data you use

Erick answered 15/3, 2010 at 6:32 Comment(0)

Smile format

It's very fast to encode, decode and compact

Speed comparison (java based but meaningful nevertheless): https://github.com/eishay/jvm-serializers/wiki/

Also it's an extension to JSON that allow you to skip base64 encoding for byte arrays

Smile encoded strings can be gzipped when space is critical

Ligulate answered 6/1, 2012 at 23:42 Comment(2)

... and the link is dead. This one seems up-to-date: github.com/FasterXML/smile-format-specification – Leapfrog 19/2, 2018 at 17:38

This is why adding links to answers is a bad move.. At the very least add a useful snippet to the answer :-) – Ostensorium 3/3, 2022 at 21:35

Just to add another option that we low level dinosaur programmers use...

An old school method that's been around since three years after the dawn of time would be the Intel HEX format. It was established in 1973 and the UNIX epoch started on January 1, 1970.

Is it more efficient? No.
Is it a well established standard? Yes.
Is it human readable like JSON? Yes-ish and a lot more readable than most any binary solution.

The json would look like:

{
    "data": [
    ":10010000214601360121470136007EFE09D2190140",
    ":100110002146017E17C20001FF5F16002148011928",
    ":10012000194E79234623965778239EDA3F01B2CAA7",
    ":100130003F0156702B5E712B722B732146013421C7",
    ":00000001FF"
    ]
}

Warwick answered 18/1, 2021 at 21:10 Comment(3)

Is it less efficient? Yes. – Disagreeable 29/4, 2021 at 12:25

We know it is less space efficient. Is it less time efficient? It is definitively more human readable efficient. – Moschatel 16/12, 2021 at 16:42

In Intel HEX, each row is placed at an explicit 16-bit address in the target file. If we interpret these as addresses of 128-byte blocks, the format can represent files up to 8 MB. If those files contain large holes (long stretches of zero bytes), these can be left out of the encoding and the encoding can in fact be very efficient. A rather special case, though; not likely useful in practice. – Doggo 10/8, 2023 at 7:52

Since you're looking for the ability to shoehorn binary data into a strictly text-based and very limited format, I think Base64's overhead is minimal compared to the convenience you're expecting to maintain with JSON. If processing power and throughput is a concern, then you'd probably need to reconsider your file formats.

Jetsam answered 18/9, 2009 at 8:29 Comment(0)

(Edit 7 years later: Google Gears is gone. Ignore this answer.)

The Google Gears team ran into the lack-of-binary-data-types problem and has attempted to address it:

Blob API

JavaScript has a built-in data type for text strings, but nothing for binary data. The Blob object attempts to address this limitation.

Maybe you can weave that in somehow.

Tameika answered 18/9, 2009 at 8:30 Comment(2)

So what is the status of blobs in Javascript and json? Has it been dropped? – Jago 5/10, 2015 at 18:15

w3.org/TR/FileAPI/#blob-section Not as performant as base64 for space, if you scroll down you find that it encodes using utf8 map (as the one of the option shown by hobbs' answer). And no json support, as far I know – Stalinist 8/4, 2020 at 6:33

In depth

I dig a little bit more (during implementation of base128), and expose that when we send characters which ascii codes are bigger than 128 then browser (chrome) in fact send TWO characters (bytes) instead one :(. The reason is that JSON by defaul use utf8 characters for which characters with ascii codes above 127 are coded by two bytes what was mention by chmike answer. I made test in this way: type in chrome url bar chrome://net-export/ , select "Include raw bytes", start capturing, send POST requests (using snippet at the bottom), stop capturing and save json file with raw requests data. Then we look inside that json file:

We can find our base64 request by finding string 4142434445464748494a4b4c4d4e this is hex coding of ABCDEFGHIJKLMN and we will see that "byte_count": 639 for it.
We can find our above127 request by finding string C2BCC2BDC380C381C382C383C384C385C386C387C388C389C38AC38B this are request-hex utf8 codes of characters ¼½ÀÁÂÃÄÅÆÇÈÉÊË (however the ascii hex codes of this characters are c1c2c3c4c5c6c7c8c9cacbcccdce). The "byte_count": 703 so it is 64bytes longer than base64 request because characters with ascii codes above 127 are code by 2 bytes in request :(

So in fact we don't have profit with sending characters with codes >127 :( . For base64 strings we not observe such negative behaviour (probably for base85 too - I don check it) - however may be some solution for this problem will be sending data in binary part of POST multipart/form-data described in Ælex answer (however usually in this case we don't need to use any base coding at all...).

The alternative approach may rely on mapping two bytes data portion into one valid utf8 character by code it using something like base65280 / base65k but probably it would be less effective than base64 due to utf8 specification ...

function postBase64() {
  let formData = new FormData();
  let req = new XMLHttpRequest();

  formData.append("base64ch", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/");
  req.open("POST", '/testBase64ch');
  req.send(formData);
}


function postAbove127() {
  let formData = new FormData();
  let req = new XMLHttpRequest();

  formData.append("above127", "¼½ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüý");
  req.open("POST", '/testAbove127');
  req.send(formData);
}

<button onclick=postBase64()>POST base64 chars</button>
<button onclick=postAbove127()>POST chars with codes>127</button>

Vano answered 3/11, 2018 at 20:30 Comment(0)

Just to add the resource and complexity standpoint to the discussion. Since doing PUT/POST and PATCH for storing new resources and altering them, one should remember that the content transfer is an exact representation of the content that is stored and that is received by issuing a GET operation.

A multi-part message is often used as a savior but for simplicity reason and for more complex tasks, I prefer the idea of giving the content as a whole. It is self-explaining and it is simple.

And yes JSON is something crippling but in the end JSON itself is verbose. And the overhead of mapping to BASE64 is a way to small.

Using Multi-Part messages correctly one has to either dismantle the object to send, use a property path as the parameter name for automatic combination or will need to create another protocol/format to just express the payload.

Also liking the BSON approach, this is not that widely and easily supported as one would like it to be.

Basically, we just miss something here but embedding binary data as base64 is well established and way to go unless you really have identified the need to do the real binary transfer (which is hardly often the case).

Stooge answered 12/4, 2016 at 11:13 Comment(1)

Sending and Receiving multipart messages in .NET is not fun, overly complex and abstracted. It's easier to just send raw strings so you can actually debug and see what is sent and received and convert the string to a JSON object or class object at the server. Base64 right in the JSON or XML string is easy and nice to debug – Rostand 4/10, 2021 at 20:31

In Node.js, you can convert a Buffer to a string and back without any change:

const serialized = buffer.toString("binary")
const deserialized = Buffer.from(serialized, "binary")

If you want more reliability by sacrificing size, replace "binary" with "base64"

Beforehand answered 4/3, 2021 at 7:43 Comment(3)

tested and aproved? – Buttery 4/3, 2021 at 15:21

If you want 100% reliability, replace "binary" with "base64" – Beforehand 4/3, 2021 at 21:37

This does not work. – Hackberry 24/7, 2022 at 16:0

I suggest to use a non-standard JSON to save binary data. As we don't really see JSON, I don't care how it represent, thus all binary in a string is acceptable, the only character need to be escaped is just double-quote(") and backslash(\) itself, all other characters are acceptable, even \n do not necessarily need to be escaped, just keep binary 0d and 0a there is ok, it won't break JSON string as it is not " so it won't finish a string.

Your code can deal with all these binary data without problem, just care about " and \ is ok.

Luckett answered 12/5, 2023 at 6:23 Comment(0)

Data type really concerns. I have tested different scenarios on sending the payload from a RESTful resource. For encoding I have used Base64(Apache) and for compression GZIP(java.utils.zip.*).The payload contains information about film,an image and an audio file. I have compressed and encoded the image and audio files which drastically degraded the performance. Encoding before compression turned out well. Image and audio content were sent as encoded and compressed bytes [] .

Kirkham answered 2/4, 2012 at 16:51 Comment(0)

Refer: http://snia.org/sites/default/files/Multi-part%20MIME%20Extension%20v1.0g.pdf

It describes a way to transfer binary data between a CDMI client and server using 'CDMI content type' operations without requiring base64 conversion of the binary data.

If you can use 'Non-CDMI content type' operation, it is ideal to transfer 'data' to/from a object. Metadata can then later be added/retrieved to/from the object as a subsequent 'CDMI content type' operation.

Decerebrate answered 22/6, 2013 at 5:40 Comment(0)

One other, more novel idea, is to encode the data via uuencode. It's a mostly deprecated, but it could still be an alternative. (Although perhaps not a serious one.)

Whitewall answered 4/3, 2021 at 21:59 Comment(1)

uuencode is a less efficient form of base64 – Saladin 3/8, 2022 at 19:37

-2

My solution now, XHR2 is using ArrayBuffer. The ArrayBuffer as binary sequence contains multipart-content, video, audio, graphic, text and so on with multiple content-types. All in One Response.

In modern browser, having DataView, StringView and Blob for different Components. See also: http://rolfrost.de/video.html for more details.

Acinus answered 11/2, 2014 at 8:16 Comment(3)

You will make your data grow +100% by serializing an array of bytes – Bren 25/6, 2018 at 11:23

@Bren wot?? – Lysenko 19/11, 2018 at 0:17

The serialization of a byte array in JSON is something like: [16, 2, 38, 89] which is very inefficient. – Bren 19/11, 2018 at 15:17

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

In depth

Recommended topics

Hot tags