Java buffered base64 encoder for streams
Asked Answered
B

2

5

I have lots of PDF files that I need to get its content encoded using base64. I have an Akka app which fetch the files as stream and distributes to many workers to encode these files and returns the string base64 for each file. I got a basic solution for encoding:

    org.apache.commons.codec.binary.Base64InputStream;
    ...
    Base64InputStream b64IStream = null;
    InputStreamReader reader = null;
    BufferedReader br = null;
    StringBuilder sb = new StringBuilder();
    try {
        b64IStream = new Base64InputStream(input, true);
        reader = new InputStreamReader(b64IStream);
        br = new BufferedReader(reader);
        String line;
        while ((line = br.readLine()) != null) {
            sb.append(line);
        }
    } finally {
        if (b64IStream != null) {
            b64IStream.close();
        }
        if (reader != null) {
            reader.close();
        }
        if (br != null) {
            br.close();
        }
    }

It works, but I would like to know what would be the best way that I can encode the files using a buffer and if there is a faster alternative for this.

I tested some other approaches such as:

  • Base64.getEncoder
  • sun.misc.BASE64Encoder
  • Base64.encodeBase64
  • javax.xml.bind.DatatypeConverter.printBase64
  • com.google.guava.BaseEncoding.base64

They are faster but they need the entire file, correct? Also, I do not want to block other threads while encoding 1 PDF file.

Any input is really helpful. Thank you!

Benge answered 22/8, 2016 at 14:59 Comment(2)
What do you mean "with a buffer". What is going to be the input, and what do you expect the output to be? A stream? A channel? A string?Luthanen
The input is an InputStream, the output is a base64 string content. The buffer would be the BufferedReader.Benge
L
14

Fun fact about Base64: It takes three bytes, and converts them into four letters. This means that if you read binary data in chunks that are divisible by three, you can feed the chunks to any Base64 encoder, and it will encode it in the same way as if you fed it the entire file.

Now, if you want your output stream to just be one long line of Base64 data - which is perfectly legal - then all you need to do is something along the lines of:

private static final int BUFFER_SIZE = 3 * 1024;

try ( BufferedInputStream in = new BufferedInputStream(input, BUFFER_SIZE); ) {
    Base64.Encoder encoder = Base64.getEncoder();
    StringBuilder result = new StringBuilder();
    byte[] chunk = new byte[BUFFER_SIZE];
    int len = 0;
    while ( (len = in.read(chunk)) == BUFFER_SIZE ) {
         result.append( encoder.encodeToString(chunk) );
    }
    if ( len > 0 ) {
         chunk = Arrays.copyOf(chunk,len);
         result.append( encoder.encodeToString(chunk) );
    }
}

This means that only the last chunk may have a length that is not divisible by three and will therefore contain the padding characters.

The above example is with Java 8 Base64, but you can really use any encoder that takes a byte array of an arbitrary length and returns the base64 string of that byte array.

This means that you can play around with the buffer size as you wish.

If you want your output to be MIME compatible, however, you need to have the output separated into lines. In this case, I would set the chunk size in the above example to something that, when multiplied by 4/3, gives you a round number of lines. For example, if you want to have 64 characters per line, each line encodes 64 / 4 * 3, which is 48 bytes. If you encode 48 bytes, you'll get one line. If you encode 480 bytes, you'll get 10 full lines.

So modify the above BUFFER_SIZE to something like 4800. Instead of Base64.getEncoder() use Base64.getMimeEncoder(64,new byte[] { 13, 10}). And then, when it encodes, you'll get 100 full-sized lines from each chunk except the last. You may need to add a result.append("\r\n") to the while loop.

Luthanen answered 23/8, 2016 at 10:47 Comment(4)
thanks a lot!! I just needed to switch the encoder because I am using java 6. Before this change it was taking ~900ms to encode, now it takes 103ms on avg for the same file.Benge
This solution doesn't seem to work correctly with InputStreams from urls, unless I buffer everything into a byte[] then input a new ByteArrayInputStream.Aftercare
Fixed it for variable length streams. gist.github.com/laxika/b9475e042973cf69ae4f2b3524575abb However, I'm not sure that it works for every InputStream implementation out there.Aftercare
Hi. I would only add that input streams may not complete any partial read operation to multiple of 3, as long as it fits in the buffer (and is more than 0 bytes).Acrosstheboard
N
1

If your goal is to read many files and convert them all to base64, there is a much shorter way of doing this.

Leave the burden of opening the files for reading, creating the file for writing and copying the data from one to the other to Files.copy.

And concentrate on encoding the bytes to base64 by wrapping the outputSteam through the java.util.Base64 encoder function : Base64.getEncoder().wrap(yourFileOutputStream).

So the whole process of converting the files inside /yourSubdirectory to base64 could be performed like this :

Files.walk(Paths.get("/yourSubdirectory"))
    .filter(Files::isRegularFile)
    .forEach(path -> {
        try {
            // Add ".b64" to the new base64 output file
            File base64file = Paths.get(path.toString() + ".b64").toFile();
            // Read the input file, convert to base 64 and write output file
            Files.copy(path, Base64.getEncoder().wrap(new FileOutuptStream(base64file)));
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    });

Nose answered 15/5, 2023 at 18:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.