Why does jackson convert byte array to base64 string on converting to json?
Asked Answered
P

1

3

When I have a byte array in a DTO and convert it to json using jackson's ObjectMapper, it automatically converts the byte array into base64 string. Example below.

@Data
@AllArgsConstructor
class TestDTO {
    private byte[] binaryFile;
}

class TestByteSerialization {
    public static void main(String[] args) throws Exception {
        ObjectMapper objectMapper = new ObjectMapper();
        byte[] bytes = Files.readAllBytes(new File("path/to/file/test.pdf").toPath());

        TestDTO dto = new TestDTO(bytes);

        String json = objectMapper.writeValueAsString(dto);
        System.out.println(json);
    }
}

I expected jackson to convert it to an array of integers like the following:

{
    "binaryFile" : [21, 45, 12, 65, 12 ,37, etc]    
}

But instead, I found it to be converted to base64 string.

{
    "binaryFile" : "ZXhhbXBsZSB0ZXh0IG9ubHkuIEJpbmFyeSBmaWxlIHdhcyBkaWZmZXJlbnQgTE9MLg=="    
}

After researching a bit, It seems json does not support byte array as mentioned here. This makes sense because, json is a string representation of data.

But I still could not find the answer for why does json not support byte array? It still is just an array of numbers right? What is the need of converting that to base64 encoded string? What is wrong in passing byte array as is to the json String as an array of numbers?

For those marking it an opinion based question:

Developers definitely wouldn't have thought "Passing bytes as an array of numbers is boring. Let's try some crazy looking encoded string". There has to be some rationale behind this.

Petrochemical answered 19/4, 2021 at 6:22 Comment(4)
There may be many reasons but one I can think of is size. As a number each byte would require 1-3 characters + the comma and when deserializing it Jackson would at least need 4 bytes for each int until those could be converted to a byte array. Thus you'd need 2-4x more memory whereas with Base64 you'd just need about 1.33x as much memory.Interrogate
I don't think seeking technical explanation for understanding why things are made to work the way they are is an opinion based question.Petrochemical
JSON supports byte arrays just fine (ok, technically an array of integers), so the premise is wrong; it just generally doesn't make sense to send byte arrays this way.Womanhater
Come on guys. This is not an opinion based question as explained in my edit. Please vote for reopening the same.Petrochemical
N
18

What is wrong in passing byte array as is to the json String as an array of numbers?

Nothing, if you're happy with each byte of input taking (on average, assuming even distribution of bytes) 3.57 characters. That's assuming you don't have a space after each comma - otherwise it's 4.57 characters.

So compare these data sizes with 10K of data:

  • Raw: 10240 bytes (can't be represented directly in JSON)
  • Base64: 13656 characters
  • Array of numbers: 36556 characters

The size increase of 33% for base64 is painful enough... the size increase of using an array is much, much worse. So the convention is to use base64 instead. (It's only a convention - it's not like it's baked into the JSON spec. But it's followed by most JSON encoders and decoders.)

Newfashioned answered 19/4, 2021 at 6:27 Comment(4)
one follow up question. For transferring a single binary file, Is there any reason why json should be preferred over sending the byte resource as application/octet-stream?Petrochemical
@ArunGowda: That's a completely different question - very much not what Stack Overflow comments are for.Newfashioned
I understand that it is beneficial to pass everything as String over the network to save memory cost etc. What I want to know is why cannot Jackson deserialize it into byteArray automatically on the other side, like it does with other datatypes.Marozas
@SaurabhTiwari: Well no, it's not advantageous to pass large pieces of opaque binary data over the network as strings, assuming you need something like base64. It's the best that's available for JSON, which requires that you're using textual data to start with, but it's not like this is the most efficient encoding for binary data. As for Jackson - sounds like you should be filing a feature request on the Jackson repo rather than commenting on a three-year-old answer.Newfashioned

© 2022 - 2024 — McMap. All rights reserved.