Remove trailing "=" when base64 encoding
Asked Answered
H

11

83

I am noticing that whenever I base64 encode a string, a "=" is appended at the end. Can I remove this character and then reliably decode it later by adding it back, or is this dangerous? In other words, is the "=" always appended, or only in certain cases?

I want my encoded string to be as short as possible, that's why I want to know if I can always remove the "=" character and just add it back before decoding.

Handicraft answered 20/12, 2010 at 18:1 Comment(3)
Let's define base64-sensible as base64-without-padding, pretty please? These equals characters are completely redundant and therefore utterly pointless. If you write a base64 decoder, please consider not rejecting inputs that don't have the padding.Joyajoyan
Sure, but just be careful if you do strip the padding that you won't be concatenating any stripped base64-encoded strings together. And of course, also make sure your decoder isn't expecting the padding.Ration
Best way to know if '=' or '==' or nothing is to be added back is to keep that info untouched. It can not get shorter than that. Removing one '=' means removing two bits.Tester
I
86

The = is padding. <!------------>

Wikipedia says

An additional pad character is allocated which may be used to force the encoded output into an integer multiple of 4 characters (or equivalently when the unencoded binary text is not a multiple of 3 bytes) ; these padding characters must then be discarded when decoding but still allow the calculation of the effective length of the unencoded text, when its input binary length would not be a multiple of 3 bytes (the last non-pad character is normally encoded so that the last 6-bit block it represents will be zero-padded on its least significant bits, at most two pad characters may occur at the end of the encoded stream).

If you control the other end, you could remove it when in transport, then re-insert it (by checking the string length) before decoding.
Note that the data will not be valid Base64 in transport.

Also, Another user pointed out (relevant to PHP users):

Note that in PHP base64_decode will accept strings without padding, hence if you remove it to process it later in PHP it's not necessary to add it back. – Mahn Oct 16 '14 at 16:33

So if your destination is PHP, you can safely strip the padding and decode without fancy calculations.

Instil answered 20/12, 2010 at 18:4 Comment(5)
Looks like this may not actually work, since at the decoding end we would need to know whether or not the "=" was removed on the encoding end.. I am not able to include that information.Handicraft
@Steve: If the length isn't a multiple of 4 characters, add = characters until it is. In .Net, if (str.Length % 4 != 0) str += new string('=', 4 - str.Length % 4)Instil
Note that in PHP base64_decode will accept strings without padding, hence if you remove it to process it later in PHP it's not necessary to add it back.Swaim
Like @Swaim mentioned, even Javascript's atob() function does not need the padding to successfully decode a base64 encoded stringChemurgy
@Swaim The same with Ruby's Base64.decode64 method; it works fine without padding. I think .NET's Convert.FromBase64String method is one of the stricter ones in that it requires padding, actually.Salena
W
38

In JavaScript you could do something like this:

// if this is your Base64 encoded string
var str = 'VGhpcyBpcyBhbiBhd2Vzb21lIHNjcmlwdA=='; 

// make URL friendly:
str = str.replace(/\+/g, '-').replace(/\//g, '_').replace(/\=+$/, '');

// reverse to original encoding
if (str.length % 4 != 0){
  str += ('===').slice(0, 4 - (str.length % 4));
}
str = str.replace(/-/g, '+').replace(/_/g, '/');

See also this Fiddle: http://jsfiddle.net/7bjaT/66/

Wileywilfong answered 21/8, 2011 at 15:33 Comment(1)
This is an awesome scriptOjeda
S
34

I wrote part of Apache's commons-codec-1.4.jar Base64 decoder, and in that logic we are fine without padding characters. End-of-file and End-of-stream are just as good indicators that the Base64 message is finished as any number of '=' characters!

The URL-Safe variant we introduced in commons-codec-1.4 omits the padding characters on purpose to keep things smaller!

http://commons.apache.org/codec/apidocs/src-html/org/apache/commons/codec/binary/Base64.html#line.478

I guess a safer answer is, "depends on your decoder implementation," but logically it is not hard to write a decoder that doesn't need padding.

Specs answered 25/1, 2011 at 19:21 Comment(4)
Interesting- thanks for this perspective. I wonder if the padding was intended to optimize hardware implementations.Handicraft
Users should note that if you encode as URL safe and then another program is decoding with something other than Apache, it will not decode correctly.Wen
Url safe also does additional transforms, : encodeUrlSafe(decode("d8vb15jT4MYKb7RpvtJq+/EH8K1h5XH14Oi+3NtrLcM")) = ="d8vb15jT4MYKb7RpvtJq-_EH8K1h5XH14Oi-3NtrLcM" Here you may see that it replaces + with minus and / with _Twinned
Yes, of course. + and / are special chars inside URLS and thus not url-safe!Specs
P
18

= is added for padding. The length of a base64 string should be multiple of 4, so 1 or 2 = are added as necessary.

Read: No, you shouldn't remove it.

Phiona answered 20/12, 2010 at 18:4 Comment(5)
So tell me. Why does this not happening to java when encoding Base64 URL safe?Bedspring
@Code.IT "URL safe"? That sounds like a Java function that already trims = characters, because those aren't URL safe.Sorghum
= is a padding character. It has nothing to do with URL safe. https://mcmap.net/q/241061/-remove-trailing-quot-quot-when-base64-encodingBedspring
@Sorghum There is a "base64 url" encoding, which replaces base64's '+' with '-' and '/' with '_'. It also omits padding as the padding has no information value (and in fact many base64 decoders already don't use it).Sheared
If you're encoding a JWT body using base64_encode you might need to.Denson
B
7

On Android I am using this:

Global

String CHARSET_NAME ="UTF-8";

Encode

String base64 = new String(
            Base64.encode(byteArray, Base64.URL_SAFE | Base64.NO_PADDING | Base64.NO_CLOSE | Base64.NO_WRAP),
            CHARSET_NAME);
return base64.trim();

Decode

byte[] bytes = Base64.decode(base64String,
            Base64.URL_SAFE | Base64.NO_PADDING | Base64.NO_CLOSE | Base64.NO_WRAP);

equals this on Java:

Encode

private static String base64UrlEncode(byte[] input)
{
    Base64 encoder = new Base64(true);
    byte[] encodedBytes = encoder.encode(input);
    return StringUtils.newStringUtf8(encodedBytes).trim();
}

Decode

private static byte[] base64UrlDecode(String input) {
    byte[] originalValue = StringUtils.getBytesUtf8(input);
    Base64 decoder = new Base64(true);
    return decoder.decode(originalValue);
}

I had never problems with trailing "=" and I am using Bouncycastle as well

Bedspring answered 6/10, 2016 at 10:46 Comment(0)
S
5

If you're encoding bytes (at fixed bit length), then the padding is redundant. This is the case for most people.

Base64 consumes 6 bits at a time and produces a byte of 8 bits that only uses six bits worth of combinations.

If your string is 1 byte (8 bits), you'll have an output of 12 bits as the smallest multiple of 6 that 8 will fit into, with 4 bits extra. If your string is 2 bytes, you have to output 18 bits, with two bits extra. For multiples of six against multiple of 8 you can have a remainder of either 0, 2 or 4 bits.

The padding says to ignore those extra four (==) or two (=) bits. The padding is there tell the decoder about your padding.

The padding isn't really needed when you're encoding bytes. A base64 encoder can simply ignore left over bits that total less than 8 bits. In this case, you're best off removing it.

The padding might be of some use for streaming and arbitrary length bit sequences as long as they're a multiple of two. It might also be used for cases where people want to only send the last 4 bits when more bits are remaining if the remaining bits are all zero. Some people might want to use it to detect incomplete sequences though it's hardly reliable for that. I've never seen this optimisation in practice. People rarely have these situations, most people use base64 for discrete byte sequences.

If you see answers suggesting to leave it on, that's not a good encouragement if you're simply encoding bytes, it's enabling a feature for a set of circumstances you don't have. The only reason to have it on in that case might be to add tolerance to decoders that don't work without the padding. If you control both ends, that's a non-concern.

Skell answered 21/5, 2019 at 14:13 Comment(0)
F
3

If you're using PHP the following function will revert the stripped string to its original format with proper padding:

<?php

$str = 'base64 encoded string without equal signs stripped';
$str = str_pad($str, strlen($str) + (4 - ((strlen($str) % 4) ?: 4)), '=');

echo $str, "\n";
Franciskus answered 30/10, 2018 at 17:51 Comment(1)
Or $str = str_pad($str, ceil(strlen($str)/4)*4, '='); (same results)Backrest
A
2

Using Python you can remove base64 padding and add it back like this:

from math import ceil

stripped = original.rstrip('=')

original = stripped.ljust(ceil(len(stripped) / 4) * 4, '=')
Anacreon answered 1/5, 2019 at 8:25 Comment(1)
base64.b64decode(s + "=" * ((4 - len(s)) % 4)) lets you skip an import. Maybe a bit strange with the negative modulo, but just wrap it in a function and forget it: def b64dec_lazypad(s): return base64.b64decode(s + "=" * ((4 - len(s)) % 4))Villeneuve
V
1

Yes, there are valid use cases where padding is omitted from a Base 64 encoding.

The JSON Web Signature (JWS) standard (RFC 7515) requires Base 64 encoded data to omit padding. It expects:

Base64 encoding [...] with all trailing '=' characters omitted (as permitted by Section 3.2) and without the inclusion of any line breaks, whitespace, or other additional characters. Note that the base64url encoding of the empty octet sequence is the empty string. (See Appendix C for notes on implementing base64url encoding without padding.)

The same applies to the JSON Web Token (JWT) standard (RFC 7519).

In addition, Julius Musseau's answer has indicated that Apache's Base 64 decoder doesn't require padding to be present in Base 64 encoded data.

Voiceless answered 19/1, 2021 at 12:23 Comment(0)
S
1

I do something like this with java8+

private static String getBase64StringWithoutPadding(String data) {
    if(data == null) {
        return "";
    }
    Base64.Encoder encoder = Base64.getEncoder().withoutPadding();
    return encoder.encodeToString(data.getBytes());
}

This method gets an encoder which leaves out padding.

As mentioned in other answers already padding can be added after calculations if you need to decode it back.

Semipro answered 19/3, 2021 at 7:2 Comment(0)
P
0

For Android You may have trouble if You want to use android.util.base64 class, since that don't let you perform UnitTest others that integration test - those uses Adnroid environment.

In other hand if You will use java.util.base64, compiler warns You that You sdk may to to low (below 26) to use it.

So I suggest Android developers to use

implementation "commons-codec:commons-codec:1.13"

Encoding object

fun encodeObjectToBase64(objectToEncode: Any): String{
    val objectJson = Gson().toJson(objectToEncode).toString()
    return encodeStringToBase64(objectJson.toByteArray(Charsets.UTF_8))
}

fun encodeStringToBase64(byteArray: ByteArray): String{
    return Base64.encodeBase64URLSafeString(byteArray).toString() // encode with no padding
}

Decoding to Object

fun <T> decodeBase64Object(encodedMessage: String, encodeToClass: Class<T>): T{
    val decodedBytes = Base64.decodeBase64(encodedMessage)
    val messageString = String(decodedBytes, StandardCharsets.UTF_8)
    return Gson().fromJson(messageString, encodeToClass)
}

Of course You may omit Gson parsing and put straight away into method Your String transformed to ByteArray

Pameliapamelina answered 30/8, 2019 at 10:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.