Base64 vs HEX for sending binary content over the internet in XML doc
Asked Answered
G

6

98

What is the best way of sending binary content between system inside an XML document

I know of Base64 and Hex, what is the real difference. I am currently using Base64 but need to include an external commons library for this where as with HEX I think I could just create a function.

Geographer answered 6/7, 2010 at 6:1 Comment(2)
So what you are saying is that if you have this binary number 1110 hex will take two characters for each byte so technically you would have say AABBCCDD which then would be the hex value, but in Base64 it takes three characters AAABBBCCCDDD. Would this be the technical way to look at it. I am sure that the AABBCCDD is not the correct value in HEX for the number. Why do some Hashing functions send back the value in slightly different lengths. For example I had a MD 5 example this morning that I tested with A then B and then C and the B hash value was one character less than the other two. DougBrahui
You might want to have a look at Efficient XML Interchange (EXI) Format 1.0. I never used it and don't know if implementations are available. It looks like you're able to embedded binary content directly inside the XML document.Bilander
R
187

You could just write your own method for Base64 as well... but I'd generally recommend using external, well-tested libraries for both. (It's not like there's any shortage of them.)

The difference between Base64 and hex is really just how bytes are represented. Hex is another way of saying "Base16". Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it's more efficient than hex. Assuming you're using UTF-8 to encode the XML document, a 100K file will take 200K to encode in hex, or 133K in Base64. Of course it may well be that you don't care about the space efficiency - in many cases it won't matter. If it does matter, then clearly Base64 is better on that front. (There are alternatives which are even more efficient, but they're not as common.)

Rrhagia answered 6/7, 2010 at 6:7 Comment(7)
Well, this is for a mobile phone, so including the commons codec does seem like a little bit of an overkill, I might just go the HEX route as I will not be encoding/decoding that much anyway.Geographer
@jax: I'd say that being on a mobile would make it much more important to use base64, when the space on the device (storage and memory) is constrained, and so is the network bandwidth. Unless you're only storing very small files (and not many of them) you're likely to be much better off including a base64 library. (It doesn't have to be commons codec - there are source files around of just base64 conversion.)Rrhagia
I would mention the fact that an HEX encoded value maintains its sort order when the encoded data is compared bit-wise, while base64 doesn't. This is particularly important in some circumstances, as for example when used to implement certain data structures.Hygienist
@Jon Skeet, What are some of the more efficient alternatives?Deca
@Mario, your comment makes no sense. Hex and Base64 encode and decode the bytes exactly the same. How you read a 4 byte hex value into, say a 32 bit integer on your particular platform is your issue, depending on Big Endian or Little Endian -nessDeca
@JonSkeet I also want to know what are the more efficient alternatives!Pentachlorophenol
@algo: I don't honestly remember what I was specifically thinking of for this 11 years ago.Rrhagia
S
50

I was curious how on EARTH base64 can convert 3 input bytes into 4 output bytes for just 33% space growth (whereas hex converts 1 input byte into 2 output bytes for 100% space growth). Why specifically 3 input bytes?

The answer is:

3 bytes = 3 x 8 bits = 24 bits.

Why that magic "24 bits" number? Well, base 64 represents the numbers 0 to 63. How are those represented in binary? With 000000 (0) to 111111 (63).

Bingo! Each base64 character represents 6 bits of input data using a single output byte (a single character such as "Z", etc).

So 24 bits (3 full 8-bit bytes of input) / 6 bits (base64 alphabet) = 4 bytes of base64. That's it!

Or, described another way, every Base64 character (which is 1 byte (8 bits)) encodes 6 bits of real data. And if we divide 8bits/6bits we see where the 33% growth comes from, as mentioned at the top of this post... So yes, Base64 always increases data size by 33% (plus some potential padding by the = characters that are sometimes added at the end of the base64 output).

You may think "Why not base128 (7 bits of input = 8 bits of output), at just 14% size growth when encoding?". The answer for that is that base64 is the best we can find, since the lower 128 ASCII characters aren't all printable. Many are control characters such as NULL etc.

There are obviously ways to create other systems such as perhaps "base81" etc, since you can do anything you want if you create a custom encoding algorithm. But the beauty of base64 is how it encodes data so cleanly in chunks of 6 bits, and how you simply have to "read 3 bytes and output 4" to encode, and "read 4 bytes and output 3" to decode. So that encoding scheme became popular.

Now you are hopefully wiser after having read this.

Fun Update: Speaking of other encoding styles with more characters... It's come to my attention that Ascii85 aka Base85 exists and is slightly more efficient (25% data size growth when encoding as Base85 instead of 33% for Base64): https://en.wikipedia.org/wiki/Ascii85

Stateless answered 2/11, 2017 at 12:0 Comment(0)
C
28

There only two 'real differences':

  1. The radix. Base64 is base-64, surprise, and hex is base-16.

  2. The encoding: base-64 encodes 3 source bytes into 4 base-64 characters (http://en.wikipedia.org/wiki/Base64#Examples); hex encodes 1 byte into 2 hex characters.

So base64 is more compact than hex.

Conspicuous answered 7/7, 2010 at 1:56 Comment(1)
There is an error in the answer. I have submitted an edit, but it is still pending. Point #2 should be: "2. The encoding: base-64 encodes 3 source bytes into 4 base-64 characters (en.wikipedia.org/wiki/Base64#Examples); hex encodes 1 byte into 2 hex characters."Hamby
B
25

Other answers made clear the efficiency difference between base16 and base64.

There is more to base selection than efficiency.

Base64 uses more than just letters and numbers. Different implementations use different punctuation characters for indicating padding, and making up the last two characters of the set of 64. These can include plus "+" and equal "=". both problematic in HTTP query strings.

So one reason to favour base16 over base64 is that base16 values can be composed directly into HTTP query strings without requiring additional encoding. Is that important to you?

Notice that this is an additional concern, over and above efficiency. Neither base is inherently better or worse; they're just two different points on a scale, at which you'll find different properties that will be more or less attractive in different situations.

For example, consider base32. It's 20% less efficient than base64, but is still suitable for use in HTTP query strings. Most of its inefficiency comes from being case-insensitive and avoiding zero "0" and one "1", to mistakes in reproduction by humans.

So base32 introduces a new concern; ease of reproduction for humans. Is that a concern for you? If it's not, you could go for something like base62, which is still convenient in HTTP query strings, but is case sensitive and includes zero "0" and "1".

Hopefully, I've clarified that the selection of your encoding base is a matter of sliding along a scale until you get the best efficiency you can have before sacrificing what's important to you.

Wikipedia has a fun list of numeral systems.

Bomarc answered 25/11, 2014 at 7:50 Comment(1)
Case sensitivity is what led me to choose either base32 or hex over base64. Thanks for the tip!Crasis
I
9

Is size important to you?

Base64 is more space efficient. Using 4 characters to represent 3 bytes where as hex uses 2 characters for each byte. In other words: hex increases the size of the string with 100%. For small strings that fit as params in url requests I wouldn't mind the extra cost/size.

Is ease of use important to you?

Hex is easier to use than Base64 because you don't need to escape (it may contain +, = and /) when using the string as a get parameter in url requests.

Is widespread use important to you?

I don't have the numbers, but Base64 might be more known to the general developer than hex depending on several factors. I knew about base64 long before hex (base16).

Intramolecular answered 15/11, 2016 at 15:25 Comment(0)
K
7

base64 has less overhead (base64 produces 4 characters for every 3 bytes of original data while hex produces 2 characters for every byte of original data). Hex is more readable - you just look at the two characters and immediately know what byte is behind, but with base64 you need effort decoding the 4-characters group, so debugging will be easier with hex.

Keystroke answered 6/7, 2010 at 6:7 Comment(1)
When I find myself reading binary data, I definitely want hex!Holmen

© 2022 - 2024 — McMap. All rights reserved.