Base64 length calculation?
Asked Answered
W

16

228

After reading the base64 wiki ...

I'm trying to figure out how's the formula working :

Given a string with length of n , the base64 length will be enter image description here

Which is : 4*Math.Ceiling(((double)s.Length/3)))

I already know that base64 length must be %4==0 to allow the decoder know what was the original text length.

The max number of padding for a sequence can be = or ==.

wiki :The number of output bytes per input byte is approximately 4 / 3 (33% overhead)

Question:

How does the information above settle with the output length enter image description here ?

Whitsuntide answered 14/11, 2012 at 12:27 Comment(0)
P
306

Each character is used to represent 6 bits (log2(64) = 6).

Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes.

So you need 4*(n/3) chars to represent n bytes, and this needs to be rounded up to a multiple of 4.

The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.

Photocurrent answered 14/11, 2012 at 12:29 Comment(11)
where is the padding gets here ?Whitsuntide
Consider if you have one byte of input. That will produce four characters of output. But only two output characters are needed to encode the input. So two characters will be padding.Pappas
Still don't understand how you got to 4*(n/3) lets say you have 123456 its length is 6. 6*6=36 bits which is 4.5 bytes. from this pont i dont understnad.Whitsuntide
For 3 bytes (3 x 8 = 24 bits) you need 4 chars (4 x 6 = 24 bits), so for 3n bytes you need 4n chars, i.e. no of chars = 4n / 3.Photocurrent
The output length is always rounded up to a multiple of 4, so 1, 2 or 3 input bytes => 4 chars; 4, 5 or 6 input bytes => 8 chars; 7, 8 or 9 input bytes => 12 chars.Photocurrent
you keep give me samples but without explaining :-) why do I have /3 ?Whitsuntide
I explained all this in the answer above: (i) each output char represents 6 bits of input, (ii) 4 output chars therefore represent 4 * 6 = 24 bits, (iii) 24 bits is 3 bytes, (iv) 3 bytes of input therefore result in 4 chars of output, (v) the ratio of output chars to input bytes is therefore 4 / 3.Photocurrent
No precise C++ formula.Regardless
Sorry just to confirm,That will make 27834 char string represent 20 KB's approx am I correct?Unconcerned
@techie_28: I make it 27308 characters for 20 * 1024 bytes, but I haven't had coffee yet this morning.Photocurrent
@PaulR yes around that much.Im actually trying to debug the problem of IE not rendering base64 Images.Manual says should be < 32 KB which it is hereUnconcerned
N
91

4 * n / 3 gives unpadded length.

And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.

((4 * n / 3) + 3) & ~3
Ninon answered 21/8, 2015 at 12:19 Comment(7)
You are right! -> 4 * n / 3 gives unpadded length! answers above are not correct. -> ((4 * n / 3) + 3) & ~3 returns the right resultPlumbo
Does not work as an input for window's API CryptBinaryToStringA.Regardless
to spell it out for people using shell: $(( ((4 * n / 3) + 3) & ~3 ))Ordonez
4 * n / 3 already fails at n = 1, one byte is encoded using two characters, and the result is clearly one character.Lizettelizotte
I think this may need to account for the '\n' every 76th character, which I've seen some base64 implementations say is required per spec. Good catch on the need for padding - I was wondering why my actual and expected values were off.Pinole
@MaartenBodewes Correct if I am mistaken but one base64 character can only encode 6-bits of binary data. Therefore you need a 2nd character to encode the remaining 2bits - so this does appear correct?Aniakudo
@Aniakudo As it is written down if n = 1 then you will get 4 / 3 = 1 using integers. As you've indicated, the expected result is 2, not 1.Lizettelizotte
P
39

For reference, the Base64 encoder's length formula is as follows:

Base64 encoder's length formula

As you said, a Base64 encoder given n bytes of data will produce a string of 4n/3 Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula for padding is 4(Ceiling(n/3)).

The Wikipedia article shows exactly how the ASCII string Man encoded into the Base64 string TWFu in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.

You ask in a comment what the size of encoding 123456 would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters.

Putting 123456 into a Base64 encoder creates MTIzNDU2, which is 8 characters long, just as we expected.

Pappas answered 25/7, 2013 at 17:26 Comment(2)
Using this formula, be aware that it doesn't give the padded length. So you can have a longer length.Frisby
To compute the expected decoded bytes from the base64 text, I use the formula floor((3 * (length - padding)) / 4). Check out the following gist.Orran
L
24

Integers

Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.

For this it is a good idea to remember how to perform the ceiling division: ceil(x / y) in doubles can be written as (x + y - 1) / y (while avoiding negative numbers, but beware of overflow).

Readable

If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):

public static int ceilDiv(int x, int y) {
    return (x + y - 1) / y;
}

public static int paddedBase64(int n) {
    int blocks = ceilDiv(n, 3);
    return blocks * 4;
}

public static int unpaddedBase64(int n) {
    int bits = 8 * n;
    return ceilDiv(bits, 6);
}

// test only
public static void main(String[] args) {
    for (int n = 0; n < 21; n++) {
        System.out.println("Base 64 padded: " + paddedBase64(n));
        System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
    }
}

Inlined

Padded

We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):

blocks = (bytes + 3 - 1) / 3
chars = blocks * 4

or combined:

chars = ((bytes + 3 - 1) / 3) * 4

your compiler will optimize out the 3 - 1, so just leave it like this to maintain readability.

Unpadded

Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:

bits = bytes * 8
chars = (bits + 6 - 1) / 6

or combined:

chars = (bytes * 8 + 6 - 1) / 6

we can however still divide by two (if we want to):

chars = (bytes * 4 + 3 - 1) / 3

Unreadable

In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):

Padded

((n + 2) / 3) << 2

Unpadded

((n << 2) | 2) / 3

So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.

Notes:

  • Obviously you may need to add 1 to the calculations to include a null termination byte.
  • For Mime you may need to take care of possible line termination characters and such (look for other answers for that).
Lizettelizotte answered 30/7, 2017 at 15:19 Comment(0)
V
18

(In an attempt to give a succinct yet complete derivation.)

Every input byte has 8 bits, so for n input bytes we get:

n × 8      input bits

Every 6 bits is an output byte, so:

ceil(n × 8 / 6)  =  ceil(n × 4 / 3)      output bytes

This is without padding.

With padding, we round that up to multiple-of-four output bytes:

ceil(ceil(n × 4 / 3) / 4) × 4  =  ceil(n × 4 / 3 / 4) × 4  =  ceil(n / 3) × 4      output bytes

See Nested Divisions (Wikipedia) for the first equivalence.

Using integer arithmetics, ceil(n / m) can be calculated as (n + m – 1) div m, hence we get:

(n * 4 + 2) div 3      without padding

(n + 2) div 3 * 4      with padding

For illustration:

 n   with padding    (n + 2) div 3 * 4    without padding   (n * 4 + 2) div 3 
------------------------------------------------------------------------------
 0                           0                                      0
 1   AA==                    4            AA                        2
 2   AAA=                    4            AAA                       3
 3   AAAA                    4            AAAA                      4
 4   AAAAAA==                8            AAAAAA                    6
 5   AAAAAAA=                8            AAAAAAA                   7
 6   AAAAAAAA                8            AAAAAAAA                  8
 7   AAAAAAAAAA==           12            AAAAAAAAAA               10
 8   AAAAAAAAAAA=           12            AAAAAAAAAAA              11
 9   AAAAAAAAAAAA           12            AAAAAAAAAAAA             12
10   AAAAAAAAAAAAAA==       16            AAAAAAAAAAAAAA           14
11   AAAAAAAAAAAAAAA=       16            AAAAAAAAAAAAAAA          15
12   AAAAAAAAAAAAAAAA       16            AAAAAAAAAAAAAAAA         16

Finally, in the case of MIME Base64 encoding, two additional bytes (CR LF) are needed per every 76 output bytes, rounded up or down depending on whether a terminating newline is required.

Volcanology answered 5/2, 2020 at 0:2 Comment(1)
Very good point about extra bytes needed for CR LF. I was missing them when allocating buffer for base64-encoded string produced by openssl.Electrophoresis
C
6

Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:

private Double calcBase64SizeInKBytes(String base64String) {
    Double result = -1.0;
    if(StringUtils.isNotEmpty(base64String)) {
        Integer padding = 0;
        if(base64String.endsWith("==")) {
            padding = 2;
        }
        else {
            if (base64String.endsWith("=")) padding = 1;
        }
        result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
    }
    return result / 1000;
}
Colorific answered 17/8, 2017 at 9:58 Comment(0)
A
5

I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.

The answer is (floor(n / 3) + 1) * 4 + 1

This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.

Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.

Attitudinarian answered 23/3, 2014 at 15:38 Comment(3)
Your formula is wrong. Consider n=3, the expected result (without null padding) is 4, but your formula returns 8.Bulger
I also think including the null terminator is silly, especially since we're talking about .net here.Bulger
Works correctly in windows, using CryptBinaryToStringA. My vote for this.Regardless
M
5

For all people who speak C, take a look at these two macros:

// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 encoding operation
#define B64ENCODE_OUT_SAFESIZE(x) ((((x) + 3 - 1)/3) * 4 + 1) 

// calculate the size of 'output' buffer required for a 'input' buffer of length x during Base64 decoding operation
#define B64DECODE_OUT_SAFESIZE(x) (((x)*3)/4) 

Taken from here.

Mahalia answered 30/9, 2019 at 10:58 Comment(0)
A
4

I don't see the simplified formula in other responses. The logic is covered but I wanted a most basic form for my embedded use:

  Unpadded = ((4 * n) + 2) / 3

  Padded = 4 * ((n + 2) / 3)

NOTE: When calculating the unpadded count we round up the integer division i.e. add Divisor-1 which is +2 in this case

Aniakudo answered 22/6, 2020 at 6:46 Comment(0)
N
3

While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:

$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c

525

$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c

710

So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.

Nostrum answered 29/5, 2016 at 1:12 Comment(3)
I've got something against calculations that require a lot of memory and CPU time while the calculations can be performed in 1 ns and one or two registers.Lizettelizotte
So when you're trying to deal with unknown amounts of binary data - how does this help?Clinch
The question is all about formulas, which help in calculating the output size without doing the base64 itself. While this answer is useful in some situations, it doesn't helps with this question.Deibel
H
1

Seems to me that the right formula should be:

n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)
Hinduism answered 7/3, 2015 at 0:12 Comment(1)
Ascii zero fill is not taken into account - does not work in Windows. (CryptBinaryToStringA)Regardless
H
1

I believe that this one is an exact answer if n%3 not zero, no ?

    (n + 3-n%3)
4 * ---------
       3

Mathematica version :

SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]

Have fun

GI

Hormonal answered 3/6, 2016 at 13:30 Comment(0)
A
1

Simple implementantion in javascript

function sizeOfBase64String(base64String) {
    if (!base64String) return 0;
    const padding = (base64String.match(/(=*)$/) || [])[1].length;
    return 4 * Math.ceil((base64String.length / 3)) - padding;
}
Allembracing answered 17/8, 2018 at 11:13 Comment(0)
F
1

If there is someone interested in achieve the @Pedro Silva solution in JS, I just ported this same solution for it:

const getBase64Size = (base64) => {
  let padding = base64.length
    ? getBase64Padding(base64)
    : 0
  return ((Math.ceil(base64.length / 4) * 3 ) - padding) / 1000
}

const getBase64Padding = (base64) => {
  return endsWith(base64, '==')
    ? 2
    : 1
}

const endsWith = (str, end) => {
  let charsFromEnd = end.length
  let extractedEnd = str.slice(-charsFromEnd)
  return extractedEnd === end
}
Figurehead answered 12/4, 2019 at 8:21 Comment(0)
R
0

In windows - I wanted to estimate size of mime64 sized buffer, but all precise calculation formula's did not work for me - finally I've ended up with approximate formula like this:

Mine64 string allocation size (approximate) = (((4 * ((binary buffer size) + 1)) / 3) + 1)

So last +1 - it's used for ascii-zero - last character needs to allocated to store zero ending - but why "binary buffer size" is + 1 - I suspect that there is some mime64 termination character ? Or may be this is some alignment issue.

Regardless answered 27/2, 2016 at 7:48 Comment(0)
C
0

You can refer to this example and see how base64 decoding works:


I have also attached the reference to base64 table. Base64 index table

Decoding->base64 string : QWJoaXNoZWs=


  1. First, you need to split the string character by character. Thus, you got 12 groups: Q W J o a X N o Z W s =

  2. Each group (character) is a Base64 character that has its own index, and now your task is to convert groups to indices. To do this, by mapping values from the Base64 Characters Table replace each character by its index (if you cannot find an index for a specific group, just discard it). All in all, you should get the following indices: 16 22 9 40 26 23 13 40 25 22 44

  3. At this step you should convert each group from decimal to binary. So find corresponding decimal values in the ASCII table and make sure you get the following binary values: 00010000 00010110 00001001 00101000 00011010 00010111 00001101 00101000 00011001 00010110 00101100

  4. Now remove the prefix “00” (two zeros) in front of each group: 010000 010110 001001 101000 011010 010111 001101 101000 011001 010110 101100

  5. There you have a simple concatenation of previous groups (that is, glue all the binary values together and get an 66-character string): 010000010110001001101000011010010111001101101000011001010110101100

  6. Then, divide the resulting string into groups so that each one has 8 characters (if the last group has less than 8 characters, you must discard it). Now you have 8 groups of eight-bit bytes: 01000001 01100010 01101000 01101001 01110011 01101000 01100101 01101011

  7. Once again using the ASCII table, convert all binary values into their ASCII characters: A b h i s h e k

  8. The final chord, concatenate all ASCII characters to get the result string: Abhishek

Thus,

Size of original string(in bytes) = floor(6n/8) – padding


Size of base64 string(in bytes) = ceil(8n/6) + padding


When decoding from base64

int paddingCount = (base64Data.endsWith("==")) ? 2 :(base64Data.endsWith("=")) ? 1 : 0;

double dataSize = floor(base64Data.length() * 3 / 4) - paddingCount;

When encoding to base64

int paddingCount = 3 - (stringToEncode.length()) % 3;

double dataSize = ceil(stringToEncode.length() * 4 / 3) + paddingCount;
Chaunce answered 9/6, 2023 at 10:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.