Base64 decode snippet in C++
Asked Answered
H

14

96

Is there a freely available Base64 decoding code snippet in C++?

Hermaphroditus answered 8/10, 2008 at 0:25 Comment(3)
Another thread shows how to encode/decode base64 using boost: #10522081Diaphanous
Base64 decoding using Boost C++ Library: #34681498Bantling
On Windows you could use atlenc.h or wincrypt.h for this.Datary
J
107

See Encoding and decoding base 64 with C++.

Here is the implementation from that page:

/*
   base64.cpp and base64.h

   Copyright (C) 2004-2008 René Nyffenegger

   This source code is provided 'as-is', without any express or implied
   warranty. In no event will the author be held liable for any damages
   arising from the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:

   1. The origin of this source code must not be misrepresented; you must not
      claim that you wrote the original source code. If you use this source code
      in a product, an acknowledgment in the product documentation would be
      appreciated but is not required.

   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original source code.

   3. This notice may not be removed or altered from any source distribution.

   René Nyffenegger [email protected]

*/

static const std::string base64_chars =
             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
             "abcdefghijklmnopqrstuvwxyz"
             "0123456789+/";


static inline bool is_base64(unsigned char c) {
  return (isalnum(c) || (c == '+') || (c == '/'));
}

std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len) {
  std::string ret;
  int i = 0;
  int j = 0;
  unsigned char char_array_3[3];
  unsigned char char_array_4[4];

  while (in_len--) {
    char_array_3[i++] = *(bytes_to_encode++);
    if (i == 3) {
      char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
      char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
      char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
      char_array_4[3] = char_array_3[2] & 0x3f;

      for(i = 0; (i <4) ; i++)
        ret += base64_chars[char_array_4[i]];
      i = 0;
    }
  }

  if (i)
  {
    for(j = i; j < 3; j++)
      char_array_3[j] = '\0';

    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
    char_array_4[3] = char_array_3[2] & 0x3f;

    for (j = 0; (j < i + 1); j++)
      ret += base64_chars[char_array_4[j]];

    while((i++ < 3))
      ret += '=';

  }

  return ret;

}
std::string base64_decode(std::string const& encoded_string) {
  int in_len = encoded_string.size();
  int i = 0;
  int j = 0;
  int in_ = 0;
  unsigned char char_array_4[4], char_array_3[3];
  std::string ret;

  while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
    char_array_4[i++] = encoded_string[in_]; in_++;
    if (i ==4) {
      for (i = 0; i <4; i++)
        char_array_4[i] = base64_chars.find(char_array_4[i]);

      char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
      char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
      char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

      for (i = 0; (i < 3); i++)
        ret += char_array_3[i];
      i = 0;
    }
  }

  if (i) {
    for (j = i; j <4; j++)
      char_array_4[j] = 0;

    for (j = 0; j <4; j++)
      char_array_4[j] = base64_chars.find(char_array_4[j]);

    char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
    char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
    char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

    for (j = 0; (j < i - 1); j++) ret += char_array_3[j];
  }

  return ret;
}
Jerad answered 8/10, 2008 at 0:28 Comment(5)
we should avoid all these unnecessary string concatenation -- since we know in_len, we know the length of ret, why not give it a fix length at initialization?Placidia
In the decode function, after the if (i), the first for loop is not necessary. If the bytes in char_array_4 are filled with 0, find() returns -1, but that is not used at all. Consequently, the second for loop can be: for (j = 0; j < i; j++). Also, the third line with char_array_3[2] is completely useless and can be dropped. Why? Because this second block handles only the rest (when i is not zero), and this can only be one or two bytes of the original text (if there were three, then there would be no rest, because three bytes can smoothly be encoded into 4 characters.)Incrust
Google is not always your friend. This implementation is almost the worst one you could possibly pick. See this: #342909Baku
THIS CODE IS BUGGY. It's wrong to store the decoded value in a string! Imagine you have a byte array being encoded, some value would be '\0', the encoding part is absolutely fine. But when you reverse it back, the value will be truncated at '\0'. Please refer to @LihO's answer.Debark
@HainanZhao C++ strings (as opposed to C strings) handle NUL bytes fine. So everything is fine, unless the c_str() method of the result string is used.Universe
I
137

Here's my modification of the implementation that was originally written by René Nyffenegger. And why have I modified it? Well, because it didn't seem appropriate to me that I should work with binary data stored within std::string object ;)

base64.h:

#ifndef _BASE64_H_
#define _BASE64_H_

#include <vector>
#include <string>
typedef unsigned char BYTE;

std::string base64_encode(BYTE const* buf, unsigned int bufLen);
std::vector<BYTE> base64_decode(std::string const&);

#endif

base64.cpp:

#include "base64.h"
#include <iostream>

static const std::string base64_chars =
             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
             "abcdefghijklmnopqrstuvwxyz"
             "0123456789+/";


static inline bool is_base64(BYTE c) {
  return (isalnum(c) || (c == '+') || (c == '/'));
}

std::string base64_encode(BYTE const* buf, unsigned int bufLen) {
  std::string ret;
  int i = 0;
  int j = 0;
  BYTE char_array_3[3];
  BYTE char_array_4[4];

  while (bufLen--) {
    char_array_3[i++] = *(buf++);
    if (i == 3) {
      char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
      char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
      char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
      char_array_4[3] = char_array_3[2] & 0x3f;

      for(i = 0; (i <4) ; i++)
        ret += base64_chars[char_array_4[i]];
      i = 0;
    }
  }

  if (i)
  {
    for(j = i; j < 3; j++)
      char_array_3[j] = '\0';

    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
    char_array_4[3] = char_array_3[2] & 0x3f;

    for (j = 0; (j < i + 1); j++)
      ret += base64_chars[char_array_4[j]];

    while((i++ < 3))
      ret += '=';
  }

  return ret;
}

std::vector<BYTE> base64_decode(std::string const& encoded_string) {
  int in_len = encoded_string.size();
  int i = 0;
  int j = 0;
  int in_ = 0;
  BYTE char_array_4[4], char_array_3[3];
  std::vector<BYTE> ret;

  while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
    char_array_4[i++] = encoded_string[in_]; in_++;
    if (i ==4) {
      for (i = 0; i <4; i++)
        char_array_4[i] = base64_chars.find(char_array_4[i]);

      char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
      char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
      char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

      for (i = 0; (i < 3); i++)
          ret.push_back(char_array_3[i]);
      i = 0;
    }
  }

  if (i) {
    for (j = i; j <4; j++)
      char_array_4[j] = 0;

    for (j = 0; j <4; j++)
      char_array_4[j] = base64_chars.find(char_array_4[j]);

    char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
    char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
    char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

    for (j = 0; (j < i - 1); j++) ret.push_back(char_array_3[j]);
  }

  return ret;
}

Here's the usage:

std::vector<BYTE> myData;
...
std::string encodedData = base64_encode(&myData[0], myData.size());
std::vector<BYTE> decodedData = base64_decode(encodedData);
Intellect answered 18/12, 2012 at 15:2 Comment(12)
Thanks. Any restrictions on the use of your code? (My personal case will be an academic CFD code).Apotheosize
I tried to decode a jpg file using your method by writing the whole vector to a CFile with it's size but it wasn't a big surprise that the image header was corrupted. The size is equal, though. Any better ideas how to restore image files?Saudra
@masche: This is about encoding and decoding data of any kind on the bytes level. Image -> raw data (bytes) -> encode into base64 string, then way back is base64 string -> decode into raw data (bytes) -> build some input stream or object or whatever on top of it to work with it as image again...Intellect
Sorry - Your code works perfectly fine and even a an image can be restored. It was stupid of me to simply squeeze the whole vector in a CFile, this won't work! If I iterate the vecteor and write every single byte to the file it works. Maybe filestream are here a better solution.Saudra
Change totemplate <typename Vec> Vec base64_decode(str const&) { Vec ret; and you can use a string as a (vector) type if needed.Swatch
@Intellect in visual studio 2015 following warning is generated: warning C4267: '=': conversion from 'size_t' to 'BYTE', possible loss of data. Is char_array_4[j] = static_cast<BYTE>(base64_chars.find(char_array_4[j])); is ok or something else is requiredSwellhead
What's wrong with storing binary data in std::string?Bergman
@Bergman Generally by bytes u want a tabs if dynamic you use std::vector<char> it's continuous memory. Whats more std::string has a c_str() method that is invalid in this case. You can using bytes_t = std::vector<char> that is much more clear. But if u store binary data inside std::string it will work.Bethesda
@GaspardP, @Bethesda I would also argue against using chars (which may be signed) to represent bytes. E.g., the result of std::string("\xff")[0] == 0xFF may surprise you or the user of your code.Finback
Thanks, I will try to use it with MQTT, it looks like enough encryption for orders in a restaurant, is it? Or can hackers recognize this easily and alter data into...Adaptation
Don't use a leading underscore in an include guard preprocessor definition. :(Labor
The original code is copyright given This notice may not be removed or altered from any source distribution.Hydrosol
J
107

See Encoding and decoding base 64 with C++.

Here is the implementation from that page:

/*
   base64.cpp and base64.h

   Copyright (C) 2004-2008 René Nyffenegger

   This source code is provided 'as-is', without any express or implied
   warranty. In no event will the author be held liable for any damages
   arising from the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:

   1. The origin of this source code must not be misrepresented; you must not
      claim that you wrote the original source code. If you use this source code
      in a product, an acknowledgment in the product documentation would be
      appreciated but is not required.

   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original source code.

   3. This notice may not be removed or altered from any source distribution.

   René Nyffenegger [email protected]

*/

static const std::string base64_chars =
             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
             "abcdefghijklmnopqrstuvwxyz"
             "0123456789+/";


static inline bool is_base64(unsigned char c) {
  return (isalnum(c) || (c == '+') || (c == '/'));
}

std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len) {
  std::string ret;
  int i = 0;
  int j = 0;
  unsigned char char_array_3[3];
  unsigned char char_array_4[4];

  while (in_len--) {
    char_array_3[i++] = *(bytes_to_encode++);
    if (i == 3) {
      char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
      char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
      char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
      char_array_4[3] = char_array_3[2] & 0x3f;

      for(i = 0; (i <4) ; i++)
        ret += base64_chars[char_array_4[i]];
      i = 0;
    }
  }

  if (i)
  {
    for(j = i; j < 3; j++)
      char_array_3[j] = '\0';

    char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
    char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
    char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
    char_array_4[3] = char_array_3[2] & 0x3f;

    for (j = 0; (j < i + 1); j++)
      ret += base64_chars[char_array_4[j]];

    while((i++ < 3))
      ret += '=';

  }

  return ret;

}
std::string base64_decode(std::string const& encoded_string) {
  int in_len = encoded_string.size();
  int i = 0;
  int j = 0;
  int in_ = 0;
  unsigned char char_array_4[4], char_array_3[3];
  std::string ret;

  while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
    char_array_4[i++] = encoded_string[in_]; in_++;
    if (i ==4) {
      for (i = 0; i <4; i++)
        char_array_4[i] = base64_chars.find(char_array_4[i]);

      char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
      char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
      char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

      for (i = 0; (i < 3); i++)
        ret += char_array_3[i];
      i = 0;
    }
  }

  if (i) {
    for (j = i; j <4; j++)
      char_array_4[j] = 0;

    for (j = 0; j <4; j++)
      char_array_4[j] = base64_chars.find(char_array_4[j]);

    char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
    char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
    char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

    for (j = 0; (j < i - 1); j++) ret += char_array_3[j];
  }

  return ret;
}
Jerad answered 8/10, 2008 at 0:28 Comment(5)
we should avoid all these unnecessary string concatenation -- since we know in_len, we know the length of ret, why not give it a fix length at initialization?Placidia
In the decode function, after the if (i), the first for loop is not necessary. If the bytes in char_array_4 are filled with 0, find() returns -1, but that is not used at all. Consequently, the second for loop can be: for (j = 0; j < i; j++). Also, the third line with char_array_3[2] is completely useless and can be dropped. Why? Because this second block handles only the rest (when i is not zero), and this can only be one or two bytes of the original text (if there were three, then there would be no rest, because three bytes can smoothly be encoded into 4 characters.)Incrust
Google is not always your friend. This implementation is almost the worst one you could possibly pick. See this: #342909Baku
THIS CODE IS BUGGY. It's wrong to store the decoded value in a string! Imagine you have a byte array being encoded, some value would be '\0', the encoding part is absolutely fine. But when you reverse it back, the value will be truncated at '\0'. Please refer to @LihO's answer.Debark
@HainanZhao C++ strings (as opposed to C strings) handle NUL bytes fine. So everything is fine, unless the c_str() method of the result string is used.Universe
M
65

There are several snippets here. However, this one is compact, efficient, and C++11 friendly:

static std::string base64_encode(const std::string &in) {

    std::string out;

    int val = 0, valb = -6;
    for (uchar c : in) {
        val = (val << 8) + c;
        valb += 8;
        while (valb >= 0) {
            out.push_back("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[(val>>valb)&0x3F]);
            valb -= 6;
        }
    }
    if (valb>-6) out.push_back("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[((val<<8)>>(valb+8))&0x3F]);
    while (out.size()%4) out.push_back('=');
    return out;
}

static std::string base64_decode(const std::string &in) {

    std::string out;

    std::vector<int> T(256,-1);
    for (int i=0; i<64; i++) T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;

    int val=0, valb=-8;
    for (uchar c : in) {
        if (T[c] == -1) break;
        val = (val << 6) + T[c];
        valb += 6;
        if (valb >= 0) {
            out.push_back(char((val>>valb)&0xFF));
            valb -= 8;
        }
    }
    return out;
}
Merino answered 2/1, 2016 at 21:51 Comment(5)
I should take that back... I should simply not do performance tests in debug mode, sorry. It's at least faster than the accepted solution.Cahilly
Bitshifting "int val" off its range is UB. "unsigned val=0; int valb=..." is correct.Varlet
typedef unsigned char uchar;Pentalpha
out.reserve(8 * (1 + in.size() / 6));Unprepared
@KevinYin In the standard this is now defined: "Right-shift on signed integral types is an arithmetic right shift, which performs sign-extension.". Even previously the code was valid, since we never actually touch bits affected by the sign due to the & 0xFF part of the code. However, val>>valb remains incorrect as valb is negative, and, according to the standard >> ... is undefined if the right operand is negative. val << -valb might be better.Hydrosol
A
42

I think this one works better:

#include <string>

static const char* B64chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

static const int B64index[256] =
{
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  62, 63, 62, 62, 63,
    52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 0,  0,  0,  0,  0,  0,
    0,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14,
    15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 0,  0,  0,  0,  63,
    0,  26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
    41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51
};

const std::string b64encode(const void* data, const size_t &len)
{
    std::string result((len + 2) / 3 * 4, '=');
    unsigned char *p = (unsigned  char*) data;
    char *str = &result[0];
    size_t j = 0, pad = len % 3;
    const size_t last = len - pad;

    for (size_t i = 0; i < last; i += 3)
    {
        int n = int(p[i]) << 16 | int(p[i + 1]) << 8 | p[i + 2];
        str[j++] = B64chars[n >> 18];
        str[j++] = B64chars[n >> 12 & 0x3F];
        str[j++] = B64chars[n >> 6 & 0x3F];
        str[j++] = B64chars[n & 0x3F];
    }
    if (pad)  /// Set padding
    {
        int n = --pad ? int(p[last]) << 8 | p[last + 1] : p[last];
        str[j++] = B64chars[pad ? n >> 10 & 0x3F : n >> 2];
        str[j++] = B64chars[pad ? n >> 4 & 0x03F : n << 4 & 0x3F];
        str[j++] = pad ? B64chars[n << 2 & 0x3F] : '=';
    }
    return result;
}

const std::string b64decode(const void* data, const size_t &len)
{
    if (len == 0) return "";

    unsigned char *p = (unsigned char*) data;
    size_t j = 0,
        pad1 = len % 4 || p[len - 1] == '=',
        pad2 = pad1 && (len % 4 > 2 || p[len - 2] != '=');
    const size_t last = (len - pad1) / 4 << 2;
    std::string result(last / 4 * 3 + pad1 + pad2, '\0');
    unsigned char *str = (unsigned char*) &result[0];

    for (size_t i = 0; i < last; i += 4)
    {
        int n = B64index[p[i]] << 18 | B64index[p[i + 1]] << 12 | B64index[p[i + 2]] << 6 | B64index[p[i + 3]];
        str[j++] = n >> 16;
        str[j++] = n >> 8 & 0xFF;
        str[j++] = n & 0xFF;
    }
    if (pad1)
    {
        int n = B64index[p[last]] << 18 | B64index[p[last + 1]] << 12;
        str[j++] = n >> 16;
        if (pad2)
        {
            n |= B64index[p[last + 2]] << 6;
            str[j++] = n >> 8 & 0xFF;
        }
    }
    return result;
}

std::string b64encode(const std::string& str)
{
    return b64encode(str.c_str(), str.size());
}

std::string b64decode(const std::string& str64)
{
    return b64decode(str64.c_str(), str64.size());
}

Thanks to Jens Alfke for pointing out a performance issue, I have made some modifications to this old post. This one works way faster than before. Its other advantage is smooth handling of corrupt data as well.

Last edition: Although in these kinds of problems, it seems that speed is an overkill, but just for the fun of it I have made some other modifications to make this one the fastest algorithm out there AFAIK. Special thanks goes to GaspardP for his valuable suggestions and nice benchmark.

Alienage answered 9/5, 2016 at 6:41 Comment(9)
All those calls to strchr are going to slow down the decoder -- you're looping an average of 32 times for every byte decoded. Most of the solutions use a 256-item lookup table to avoid this, which is a lot faster.Pudens
I commented on #342909 - although this code is not the fastest to encode, it is the fastest to decode (compared against 16 other implementations).Bergman
In the decoder, you can boost performance a bit further (10-15% in my tests) by working on a char* instead of a std::string (simply do char* out = &str[0] and then use out[j++] instead of str[j++]) By doing so, you skip the unnecessary checks done by std::string::operator[]. Also, avoid the last push_back which can turn out to be very expensive by allocating one more byte (std::string str; str.resize(3*((len+3)/4)); and then use out[j++] everywere and str.resize(j); at the end.Bergman
You added a memory leak in the last edit, not to mention a buffer copy. Don't use new without delete. Actually, don't use new at all. @Gaspard, I'm not aware of any "unnecessary checks done by std::string::operator[]" (in fact I'm reasonably sure that there are none, at least in release), but you could use a vector<char> if really desperate - anyway your replacement code is not what Gaspard suggested, which still used a string for everything but the element access and was safe/fast regardless of the fact that I think it was unnecessary ;)Throughcomposed
@LightnessRacesinOrbit : oops! Being used to C# these days totally made me forget about memory leaks. I edited it again. Thanks for the info btw.Earth
@LightnessRacesinOrbit: in release, operator[] is documented as unchecked - you are right. However std::string has an optimization where small strings are allocated within the structure instead of on heap. Each time you do [] on a string, it checks if it is a small to know where to get the buffer from. My tests have shown it is not always optimized away. On Windows it looks like this: value_type *_Myptr() { // determine current pointer to buffer for mutable string return (this->_BUF_SIZE <= _Myres ? _Unfancy(_Bx._Ptr) : _Bx._Buf); } Bergman
@polfosolఠ_ఠ , Thanks for this. Please consider to use unsigned char *p , instead of char *p in encoder. I've got corrupted base64 string, if my input contains bytes >= 0x80. After adding unsigned, seems, that all is ok.Fuss
How does this compile? You are attempting to cast away constness with (char*) data and with (unsigned char*) data. This should cause an error. You should prefer to use reinterpret_cast<const unsigned char*>(data); as a clear indication to the reader what's happening, or, better yet, simply accept data as const unsigned char* instead of const void*, since that's an idiomatic way to reference any sequence of bytes.Angrist
I assume, that in a large application, that uses this encoder/decoder once in a while, performance would benefit, if B64index is declared as char or unsigned char instead of int due to the reduced memory footprint.Universe
D
18

Using base-n mini lib, you can do the following:

some_data_t in[] { ... };
constexpr int len = sizeof(in)/sizeof(in[0]);

std::string encoded;
bn::encode_b64(in, in + len, std::back_inserter(encoded));

some_data_t out[len];
bn::decode_b64(encoded.begin(), encoded.end(), out);

The API is generic, iterator-based.

Disclosure: I'm the author.

Domesticate answered 4/8, 2014 at 16:13 Comment(3)
std::size would be nicer than sizeof haxThroughcomposed
Indeed, starting with c++17 that's clearly the right wait to do this.Domesticate
It has been the right way to do it since the start. It's just that before C++17 you had to implement it yourself (but it's practically a one-liner). The sizeof hack is never okay in C++.Throughcomposed
B
16

According to this excellent comparison made by GaspardP I would not choose this solution. It's not the worst, but it's not the best either. The only thing it got going for it is that it's possibly easier to understand.

I found the other two answers to be pretty hard to understand. They also produce some warnings in my compiler and the use of a find function in the decode part should result in a pretty bad efficiency. So I decided to roll my own.

Header:

#ifndef _BASE64_H_
#define _BASE64_H_

#include <vector>
#include <string>
typedef unsigned char BYTE;

class Base64
{
public:
    static std::string encode(const std::vector<BYTE>& buf);
    static std::string encode(const BYTE* buf, unsigned int bufLen);
    static std::vector<BYTE> decode(std::string encoded_string);
};

#endif

Body:

static const BYTE from_base64[] = {    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
                                    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
                                    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,  62, 255,  62, 255,  63,
                                     52,  53,  54,  55,  56,  57,  58,  59,  60,  61, 255, 255, 255, 255, 255, 255,
                                    255,   0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
                                     15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25, 255, 255, 255, 255,  63,
                                    255,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,
                                     41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51, 255, 255, 255, 255, 255};

static const char to_base64[] =
             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
             "abcdefghijklmnopqrstuvwxyz"
             "0123456789+/";


std::string Base64::encode(const std::vector<BYTE>& buf)
{
    if (buf.empty())
        return ""; // Avoid dereferencing buf if it's empty
    return encode(&buf[0], (unsigned int)buf.size());
}

std::string Base64::encode(const BYTE* buf, unsigned int bufLen)
{
    // Calculate how many bytes that needs to be added to get a multiple of 3
    size_t missing = 0;
    size_t ret_size = bufLen;
    while ((ret_size % 3) != 0)
    {
        ++ret_size;
        ++missing;
    }

    // Expand the return string size to a multiple of 4
    ret_size = 4*ret_size/3;

    std::string ret;
    ret.reserve(ret_size);

    for (unsigned int i=0; i<ret_size/4; ++i)
    {
        // Read a group of three bytes (avoid buffer overrun by replacing with 0)
        size_t index = i*3;
        BYTE b3[3];
        b3[0] = (index+0 < bufLen) ? buf[index+0] : 0;
        b3[1] = (index+1 < bufLen) ? buf[index+1] : 0;
        b3[2] = (index+2 < bufLen) ? buf[index+2] : 0;

        // Transform into four base 64 characters
        BYTE b4[4];
        b4[0] =                            ((b3[0] & 0xfc) >> 2);
        b4[1] = ((b3[0] & 0x03) << 4) +    ((b3[1] & 0xf0) >> 4);
        b4[2] = ((b3[1] & 0x0f) << 2) +    ((b3[2] & 0xc0) >> 6);
        b4[3] = ((b3[2] & 0x3f) << 0);

        // Add the base 64 characters to the return value
        ret.push_back(to_base64[b4[0]]);
        ret.push_back(to_base64[b4[1]]);
        ret.push_back(to_base64[b4[2]]);
        ret.push_back(to_base64[b4[3]]);
    }

    // Replace data that is invalid (always as many as there are missing bytes)
    for (size_t i=0; i<missing; ++i)
        ret[ret_size - i - 1] = '=';

    return ret;
}

std::vector<BYTE> Base64::decode(std::string encoded_string)
{
    // Make sure string length is a multiple of 4
    while ((encoded_string.size() % 4) != 0)
        encoded_string.push_back('=');

    size_t encoded_size = encoded_string.size();
    std::vector<BYTE> ret;
    ret.reserve(3*encoded_size/4);

    for (size_t i=0; i<encoded_size; i += 4)
    {
        // Get values for each group of four base 64 characters
        BYTE b4[4];
        b4[0] = (encoded_string[i+0] <= 'z') ? from_base64[encoded_string[i+0]] : 0xff;
        b4[1] = (encoded_string[i+1] <= 'z') ? from_base64[encoded_string[i+1]] : 0xff;
        b4[2] = (encoded_string[i+2] <= 'z') ? from_base64[encoded_string[i+2]] : 0xff;
        b4[3] = (encoded_string[i+3] <= 'z') ? from_base64[encoded_string[i+3]] : 0xff;

        // Transform into a group of three bytes
        BYTE b3[3];
        b3[0] = ((b4[0] & 0x3f) << 2) + ((b4[1] & 0x30) >> 4);
        b3[1] = ((b4[1] & 0x0f) << 4) + ((b4[2] & 0x3c) >> 2);
        b3[2] = ((b4[2] & 0x03) << 6) + ((b4[3] & 0x3f) >> 0);

        // Add the byte to the return value if it isn't part of an '=' character (indicated by 0xff)
        if (b4[1] != 0xff) ret.push_back(b3[0]);
        if (b4[2] != 0xff) ret.push_back(b3[1]);
        if (b4[3] != 0xff) ret.push_back(b3[2]);
    }

    return ret;
}

Usage:

BYTE buf[] = "ABCD";
std::string encoded = Base64::encode(buf, 4);
// encoded = "QUJDRA=="
std::vector<BYTE> decoded = Base64::decode(encoded);

A bonus here is that the decode function can also decode the URL variant of Base64 encoding.

Baku answered 9/7, 2015 at 15:49 Comment(7)
Bonus points for no find(), and a reserve() for the outputs. 1 little point off because you take the input as a copy (so you can add a = at the end if required). Would've been nice if it were a no-copy thing.Aaren
longer: I like this answer best... great for the # of lines too. Bonus points for no find(), and a reserve() for the outputs. Things I'd improve (which would bulk out the code): you take the input as a copy (so you can add a = at the end if required). Would've been nice if it were a no-copy thing. And could have also been written as plain functions -- no need for the class. And should check for empty buf vector before dereferencing buf[0]. And add interface to write the data out to a reference (so caller can reuse memory).Aaren
Thanks for the feedback, the reason I took the string by value is that it simplifies the code if I can always guarantee it has a valid length. I think that in most cases RVO should prevent a string copy anyway, so it shouldn't be a problem. As for dereferencing buf[0] - good catch, I'll fix that :)Baku
I finished my own modified version of yours, I'll post as an answer here for your pleasure... i removed the need for the string as a copy by adding a test (you already do tests, so it doesn't really add any performance problem).Aaren
I'm probably wrong, but shouldn't the values at indexes 60 and 61 in from_base64[] be swapped? I'm guessing the idea is to ignore "<" (index=60) by returning 255, and to recognize the padding character "=" (index=61) by returning 0.Twister
You are not wrong, but the algorithm still works with this bug ;)Baku
The bug is that the '<' char should have 255 as well (for the '=' char 255 is correct and having 0 here would break the algorithm). This bug makes it so the result when a '<' character is in the string is incorrect, but using unsupported characters like '<' in a base64 string is undefined anyway, so it doesn't matter. I have updated my original answer to fix the bug.Baku
A
10

A little variation with a more compact lookup table and using C++17 features:

std::string base64_decode(const std::string_view in) {
  // table from '+' to 'z'
  const uint8_t lookup[] = {
      62,  255, 62,  255, 63,  52,  53, 54, 55, 56, 57, 58, 59, 60, 61, 255,
      255, 0,   255, 255, 255, 255, 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,
      10,  11,  12,  13,  14,  15,  16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
      255, 255, 255, 255, 63,  255, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
      36,  37,  38,  39,  40,  41,  42, 43, 44, 45, 46, 47, 48, 49, 50, 51};
  static_assert(sizeof(lookup) == 'z' - '+' + 1);

  std::string out;
  int val = 0, valb = -8;
  for (uint8_t c : in) {
    if (c < '+' || c > 'z')
      break;
    c -= '+';
    if (lookup[c] >= 64)
      break;
    val = (val << 6) + lookup[c];
    valb += 6;
    if (valb >= 0) {
      out.push_back(char((val >> valb) & 0xFF));
      valb -= 8;
    }
  }
  return out;
}

If you don't have std::string_view, try instead std::experimental::string_view.

Agreeable answered 15/6, 2017 at 8:30 Comment(0)
A
5

My variation on DaedalusAlpha's answer:

It avoids copying the parameters at the expense of a couple of tests.

Uses uint8_t instead of BYTE.

Adds some handy functions for dealing with strings, although usually the input data is binary and may have zero bytes inside, so typically should not be manipulated as a string (which often implies null-terminated data).

Also adds some casts to fix compiler warnings (at least on GCC, I haven't run it through MSVC yet).

Part of file base64.hpp:

void base64_encode(string & out, const vector<uint8_t>& buf);
void base64_encode(string & out, const uint8_t* buf, size_t bufLen);
void base64_encode(string & out, string const& buf);

void base64_decode(vector<uint8_t> & out, string const& encoded_string);

// Use this if you know the output should be a valid string
void base64_decode(string & out, string const& encoded_string);

File base64.cpp:

static const uint8_t from_base64[128] = {
    // 8 rows of 16 = 128
    // Note: only requires 123 entries, as we only lookup for <= z , which z=122

    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,  62, 255,  62, 255,  63,
     52,  53,  54,  55,  56,  57,  58,  59,  60,  61, 255, 255,   0, 255, 255, 255,
    255,   0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,
     15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25, 255, 255, 255, 255,  63,
    255,  26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  39,  40,
     41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51, 255, 255, 255, 255, 255
};

static const char to_base64[65] =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    "abcdefghijklmnopqrstuvwxyz"
    "0123456789+/";


void base64_encode(string & out, string const& buf)
{
   if (buf.empty())
      base64_encode(out, NULL, 0);
   else
      base64_encode(out, reinterpret_cast<uint8_t const*>(&buf[0]), buf.size());
}


void base64_encode(string & out, std::vector<uint8_t> const& buf)
{
   if (buf.empty())
      base64_encode(out, NULL, 0);
   else
      base64_encode(out, &buf[0], buf.size());
}

void base64_encode(string & ret, uint8_t const* buf, size_t bufLen)
{
   // Calculate how many bytes that needs to be added to get a multiple of 3
   size_t missing = 0;
   size_t ret_size = bufLen;
   while ((ret_size % 3) != 0)
   {
      ++ret_size;
      ++missing;
   }

   // Expand the return string size to a multiple of 4
   ret_size = 4*ret_size/3;

   ret.clear();
   ret.reserve(ret_size);

   for (size_t i = 0; i < ret_size/4; ++i)
   {
      // Read a group of three bytes (avoid buffer overrun by replacing with 0)
      const size_t index = i*3;
      const uint8_t b3_0 = (index+0 < bufLen) ? buf[index+0] : 0;
      const uint8_t b3_1 = (index+1 < bufLen) ? buf[index+1] : 0;
      const uint8_t b3_2 = (index+2 < bufLen) ? buf[index+2] : 0;

      // Transform into four base 64 characters
      const uint8_t b4_0 =                        ((b3_0 & 0xfc) >> 2);
      const uint8_t b4_1 = ((b3_0 & 0x03) << 4) + ((b3_1 & 0xf0) >> 4);
      const uint8_t b4_2 = ((b3_1 & 0x0f) << 2) + ((b3_2 & 0xc0) >> 6);
      const uint8_t b4_3 = ((b3_2 & 0x3f) << 0);

      // Add the base 64 characters to the return value
      ret.push_back(to_base64[b4_0]);
      ret.push_back(to_base64[b4_1]);
      ret.push_back(to_base64[b4_2]);
      ret.push_back(to_base64[b4_3]);
   }

   // Replace data that is invalid (always as many as there are missing bytes)
   for (size_t i = 0; i != missing; ++i)
      ret[ret_size - i - 1] = '=';
}


template <class Out>
void base64_decode_any( Out & ret, std::string const& in)
{
   typedef typename Out::value_type T;

   // Make sure the *intended* string length is a multiple of 4
   size_t encoded_size = in.size();

   while ((encoded_size % 4) != 0)
      ++encoded_size;

   const size_t N = in.size();
   ret.clear();
   ret.reserve(3*encoded_size/4);

   for (size_t i = 0; i < encoded_size; i += 4)
   {
      // Note: 'z' == 122

      // Get values for each group of four base 64 characters
      const uint8_t b4_0 = (            in[i+0] <= 'z') ? from_base64[static_cast<uint8_t>(in[i+0])] : 0xff;
      const uint8_t b4_1 = (i+1 < N and in[i+1] <= 'z') ? from_base64[static_cast<uint8_t>(in[i+1])] : 0xff;
      const uint8_t b4_2 = (i+2 < N and in[i+2] <= 'z') ? from_base64[static_cast<uint8_t>(in[i+2])] : 0xff;
      const uint8_t b4_3 = (i+3 < N and in[i+3] <= 'z') ? from_base64[static_cast<uint8_t>(in[i+3])] : 0xff;

      // Transform into a group of three bytes
      const uint8_t b3_0 = ((b4_0 & 0x3f) << 2) + ((b4_1 & 0x30) >> 4);
      const uint8_t b3_1 = ((b4_1 & 0x0f) << 4) + ((b4_2 & 0x3c) >> 2);
      const uint8_t b3_2 = ((b4_2 & 0x03) << 6) + ((b4_3 & 0x3f) >> 0);

      // Add the byte to the return value if it isn't part of an '=' character (indicated by 0xff)
      if (b4_1 != 0xff) ret.push_back( static_cast<T>(b3_0) );
      if (b4_2 != 0xff) ret.push_back( static_cast<T>(b3_1) );
      if (b4_3 != 0xff) ret.push_back( static_cast<T>(b3_2) );
   }
}

void base64_decode(vector<uint8_t> & out, string const& encoded_string)
{
   base64_decode_any(out, encoded_string);
}

void base64_decode(string & out, string const& encoded_string)
{
   base64_decode_any(out, encoded_string);
}
Aaren answered 11/2, 2016 at 0:3 Comment(2)
I didn't know that && and || had a defined evaluation order in C++, so I learned something new today. In cases like this where you want to check the condition of an index but at the same time make sure the index isn't out of range, then it's extremely useful.Baku
Yeah I use that technique all the time.Aaren
L
3

My version is a simple fast encoder (decoder) of Base64 for C++Builder.

// ---------------------------------------------------------------------------
UnicodeString __fastcall TExample::Base64Encode(void *data, int length)
{
    if (length <= 0)
        return L"";
    static const char set[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
    unsigned char *in = (unsigned char*)data;
    char *pos, *out = pos = new char[((length - 1) / 3 + 1) << 2];
    while ((length -= 3) >= 0)
    {
        pos[0] = set[in[0] >> 2];
        pos[1] = set[((in[0] & 0x03) << 4) | (in[1] >> 4)];
        pos[2] = set[((in[1] & 0x0F) << 2) | (in[2] >> 6)];
        pos[3] = set[in[2] & 0x3F];
        pos += 4;
        in += 3;
    };
    if ((length & 2) != 0)
    {
        pos[0] = set[in[0] >> 2];
        if ((length & 1) != 0)
        {
            pos[1] = set[((in[0] & 0x03) << 4) | (in[1] >> 4)];
            pos[2] = set[(in[1] & 0x0F) << 2];
        }
        else
        {
            pos[1] = set[(in[0] & 0x03) << 4];
            pos[2] = '=';
        };
        pos[3] = '=';
        pos += 4;
    };
    UnicodeString code = UnicodeString(out, pos - out);
    delete[] out;
    return code;
};

// ---------------------------------------------------------------------------
int __fastcall TExample::Base64Decode(const UnicodeString &code, unsigned char **data)
{
    int length;
    if (((length = code.Length()) == 0) || ((length & 3) != 0))
        return 0;
    wchar_t *str = code.c_str();
    unsigned char *pos, *out = pos = new unsigned char[(length >> 2) * 3];
    while (*str != 0)
    {
        length = -1;
        int shift = 18, bits = 0;
        do
        {
            wchar_t s = str[++length];
            if ((s >= L'A') && (s <= L'Z'))
                bits |= (s - L'A') << shift;
            else if ((s >= L'a') && (s <= L'z'))
                   bits |= (s - (L'a' - 26)) << shift;
            else if (((s >= L'0') && (s <= L'9')))
                   bits |= (s - (L'0' - 52)) << shift;
            else if (s == L'+')
                   bits |= 62 << shift;
            else if (s == L'/')
                   bits |= 63 << shift;
            else if (s == L'=')
            {
                length--;
                break;
            }
            else
            {
                delete[] out;
                return 0;
            };
        }
        while ((shift -= 6) >= 0);
        pos[0] = bits >> 16;
        pos[1] = bits >> 8;
        pos[2] = bits;
        pos += length;
        str += 4;
    };
    *data = out;
    return pos - out;
};
//---------------------------------------------------------------------------
Loganloganberry answered 24/12, 2019 at 6:40 Comment(0)
S
2

I use this:

class BinaryVector {
public:
    std::vector<char> bytes;

    uint64_t bit_count = 0;

public:
    /* Add a bit to the end */
    void push_back(bool bit);

    /* Return false if character is unrecognized */
    bool pushBase64Char(char b64_c);
};

void BinaryVector::push_back(bool bit)
{
    if (!bit_count || bit_count % 8 == 0) {
        bytes.push_back(bit << 7);
    }
    else {
        uint8_t next_bit = 8 - (bit_count % 8) - 1;
        bytes[bit_count / 8] |= bit << next_bit;
    }
    bit_count++;
}

/* Converts one Base64 character to 6 bits */
bool BinaryVector::pushBase64Char(char c)
{
    uint8_t d;

    // A to Z
    if (c > 0x40 && c < 0x5b) {
        d = c - 65;  // Base64 A is 0
    }
    // a to z
    else if (c > 0x60 && c < 0x7b) {
        d = c - 97 + 26;  // Base64 a is 26
    }
    // 0 to 9
    else if (c > 0x2F && c < 0x3a) {
        d = c - 48 + 52;  // Base64 0 is 52
    }
    else if (c == '+') {
        d = 0b111110;
    }
    else if (c == '/') {
        d = 0b111111;
    }
    else if (c == '=') {
        d = 0;
    }
    else {
        return false;
    }

    push_back(d & 0b100000);
    push_back(d & 0b010000);
    push_back(d & 0b001000);
    push_back(d & 0b000100);
    push_back(d & 0b000010);
    push_back(d & 0b000001);

    return true;
}

bool loadBase64(std::vector<char>& b64_bin, BinaryVector& vec)
{
    for (char& c : b64_bin) {
        if (!vec.pushBase64Char(c)) {
            return false;
        }
    }
    return true;
}

Use vec.bytes to access converted data.

Starbuck answered 7/12, 2019 at 16:5 Comment(0)
N
1

I firstly made my own version and then found this topic.

Why does my version look simpler than others presented here? Am I doing something wrong? I didn't test it for speed.

inline char const* b64units = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";

inline char* b64encode(void const* a, int64_t b) {
    ASSERT(a != nullptr);
    if (b > 0) {
        uint8_t const* aa = static_cast<uint8_t const*>(a);
        uint8_t v = 0;
        int64_t bp = 0;
        int64_t sb = 0;
        int8_t off = 0;
        int64_t nt = ((b + 2) / 3) * 4;
        int64_t nd = (b * 8) / 6;
        int64_t tl = ((b * 8) % 6) ? 1 : 0;
        int64_t nf = nt - nd - tl;
        int64_t ri = 0;
        char* r = new char[nt + 1]();
        for (int64_t i = 0; i < nd; i++) {
            v = (aa[sb] << off) | (aa[sb + 1] >> (8 - off));
            v >>= 2;
            r[ri] = b64units[v];
            ri += 1;
            bp += 6;
            sb = (bp / 8);
            off = (bp % 8);
        }
        if (tl > 0) {
            v = (aa[sb] << off);
            v >>= 2;
            r[ri] = b64units[v];
            ri += 1;
        }
        for (int64_t i = 0; i < nf; i++) {
            r[ri] = '=';
            ri += 1;
        }
        return r;
    } else return nullptr;
}

P.S.: My method works well. I tested it with Node.js:

let data = 'stackabuse.com';
let buff = new Buffer(data);
let base64data = buff.toString('base64');
Natka answered 7/3, 2020 at 1:20 Comment(0)
R
0

I liked this solution on GitHub.

It is a single hpp file and it uses the vector<byte> type for raw data unlike the accepted answer.

#pragma once

#include <string>
#include <vector>
#include <stdexcept>
#include <cstdint>

namespace base64
{
    inline static const char kEncodeLookup[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
    inline static const char kPadCharacter = '=';

    using byte = std::uint8_t;

    inline std::string encode(const std::vector<byte>& input)
    {
        std::string encoded;
        encoded.reserve(((input.size() / 3) + (input.size() % 3 > 0)) * 4);

        std::uint32_t temp{};
        auto it = input.begin();

        for(std::size_t i = 0; i < input.size() / 3; ++i)
        {
            temp  = (*it++) << 16;
            temp += (*it++) << 8;
            temp += (*it++);
            encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
            encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
            encoded.append(1, kEncodeLookup[(temp & 0x00000FC0) >> 6 ]);
            encoded.append(1, kEncodeLookup[(temp & 0x0000003F)      ]);
        }

        switch(input.size() % 3)
        {
        case 1:
            temp = (*it++) << 16;
            encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
            encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
            encoded.append(2, kPadCharacter);
            break;
        case 2:
            temp  = (*it++) << 16;
            temp += (*it++) << 8;
            encoded.append(1, kEncodeLookup[(temp & 0x00FC0000) >> 18]);
            encoded.append(1, kEncodeLookup[(temp & 0x0003F000) >> 12]);
            encoded.append(1, kEncodeLookup[(temp & 0x00000FC0) >> 6 ]);
            encoded.append(1, kPadCharacter);
            break;
        }

        return encoded;
    }

    std::vector<byte> decode(const std::string& input)
    {
        if(input.length() % 4)
            throw std::runtime_error("Invalid base64 length!");

        std::size_t padding{};

        if(input.length())
        {
            if(input[input.length() - 1] == kPadCharacter) padding++;
            if(input[input.length() - 2] == kPadCharacter) padding++;
        }

        std::vector<byte> decoded;
        decoded.reserve(((input.length() / 4) * 3) - padding);

        std::uint32_t temp{};
        auto it = input.begin();

        while(it < input.end())
        {
            for(std::size_t i = 0; i < 4; ++i)
            {
                temp <<= 6;
                if     (*it >= 0x41 && *it <= 0x5A) temp |= *it - 0x41;
                else if(*it >= 0x61 && *it <= 0x7A) temp |= *it - 0x47;
                else if(*it >= 0x30 && *it <= 0x39) temp |= *it + 0x04;
                else if(*it == 0x2B)                temp |= 0x3E;
                else if(*it == 0x2F)                temp |= 0x3F;
                else if(*it == kPadCharacter)
                {
                    switch(input.end() - it)
                    {
                    case 1:
                        decoded.push_back((temp >> 16) & 0x000000FF);
                        decoded.push_back((temp >> 8 ) & 0x000000FF);
                        return decoded;
                    case 2:
                        decoded.push_back((temp >> 10) & 0x000000FF);
                        return decoded;
                    default:
                        throw std::runtime_error("Invalid padding in base64!");
                    }
                }
                else throw std::runtime_error("Invalid character in base64!");

                ++it;
            }

            decoded.push_back((temp >> 16) & 0x000000FF);
            decoded.push_back((temp >> 8 ) & 0x000000FF);
            decoded.push_back((temp      ) & 0x000000FF);
        }

        return decoded;
    }
}
Repent answered 7/1, 2021 at 12:20 Comment(0)
S
0

Here is one written by me which uses unions and bit fields for maximum efficiency and readibility.

const char PADDING_CHAR = '=';
const char* ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
const uint8_t DECODED_ALPHBET[128]={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,62,0,0,0,63,52,53,54,55,56,57,58,59,60,61,0,0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,0,0,0,0,0,0,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,0,0,0,0,0};

/**
 * Given a string, this function will encode it in 64b (with padding)
 */
std::string encodeBase64(const std::string& binaryText)
{
    std::string encoded((binaryText.size()/3 + (binaryText.size()%3 > 0)) << 2, PADDING_CHAR);

    const char* bytes = binaryText.data();
    union
    {
        uint32_t temp = 0;
        struct
        {
            uint32_t first : 6, second : 6, third : 6, fourth : 6;
        } tempBytes;
    };
    std::string::iterator currEncoding = encoded.begin();

    for(uint32_t i = 0, lim = binaryText.size() / 3; i < lim; ++i, bytes+=3)
    {
        temp = bytes[0] << 16 | bytes[1] << 8 | bytes[2];
        (*currEncoding++) = ALPHABET[tempBytes.fourth];
        (*currEncoding++) = ALPHABET[tempBytes.third];
        (*currEncoding++) = ALPHABET[tempBytes.second];
        (*currEncoding++) = ALPHABET[tempBytes.first];
    }

    switch(binaryText.size() % 3)
    {
    case 1:
        temp = bytes[0] << 16;
        (*currEncoding++) = ALPHABET[tempBytes.fourth];
        (*currEncoding++) = ALPHABET[tempBytes.third];
        break;
    case 2:
        temp = bytes[0] << 16 | bytes[1] << 8;
        (*currEncoding++) = ALPHABET[tempBytes.fourth];
        (*currEncoding++) = ALPHABET[tempBytes.third];
        (*currEncoding++) = ALPHABET[tempBytes.second];
        break;
    }

    return encoded;
}

/**
 * Given a 64b padding-encoded string, this function will decode it.
 */
std::string decodeBase64(const std::string& base64Text)
{
    if( base64Text.empty() )
        return "";

    assert((base64Text.size()&3) == 0 && "The base64 text to be decoded must have a length devisible by 4!");

    uint32_t numPadding =  (*std::prev(base64Text.end(),1) == PADDING_CHAR) + (*std::prev(base64Text.end(),2) == PADDING_CHAR);

    std::string decoded((base64Text.size()*3>>2) - numPadding, '.');

    union
    {
        uint32_t temp;
        char tempBytes[4];
    };
    const uint8_t* bytes = reinterpret_cast<const uint8_t*>(base64Text.data());

    std::string::iterator currDecoding = decoded.begin();

    for(uint32_t i = 0, lim = (base64Text.size() >> 2) - (numPadding!=0); i < lim; ++i, bytes+=4)
    {
        temp = DECODED_ALPHBET[bytes[0]] << 18 | DECODED_ALPHBET[bytes[1]] << 12 | DECODED_ALPHBET[bytes[2]] << 6 | DECODED_ALPHBET[bytes[3]];
        (*currDecoding++) = tempBytes[2];
        (*currDecoding++) = tempBytes[1];
        (*currDecoding++) = tempBytes[0];
    }

    switch (numPadding)
    {
    case 2:
        temp = DECODED_ALPHBET[bytes[0]] << 18 | DECODED_ALPHBET[bytes[1]] << 12;
        (*currDecoding++) = tempBytes[2];
        break;
    
    case 1:
        temp = DECODED_ALPHBET[bytes[0]] << 18 | DECODED_ALPHBET[bytes[1]] << 12 | DECODED_ALPHBET[bytes[2]] << 6;
        (*currDecoding++) = tempBytes[2];
        (*currDecoding++) = tempBytes[1];
        break;
    }

    return decoded;
}
Sandy answered 24/2, 2021 at 15:8 Comment(1)
This is shown in many places as performing well. Have you posted this anywhere, Alexander, and what is the license to use this implementation?Lining
C
0

Easy version c++:

std::string process(std::string encod) 
{
    //The string we are going to decode is stored in the variable "encod"
    std::cout << "string to decode : " << encod << std::endl;
    
    //Decoding -- base64 alpahabet
    std::string base64="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
    
    // Getting 6-bit values of each element of the encoded str
    std::string bit6;
    
    for (int i=0; i <= encod.size(); i++) 
    {
        //If we encounter a "=" it is padding, and we don't care about it
        if (encod[i] == '=') 
        {
            break;
        };
        
        //Adding each 6-bit value of each character to "bit6"
        std::bitset<6> b(base64.find(encod[i]));
        
        bit6 += b.to_string();
        
    };
    
    while (bit6.size() % 8 != 0) 
    {
        bit6 += '0';  // Padding with zeros
    };
    
    // Transforming the list of 6-bits to a 8-bit array
    std::string homedecod;
    
    while (bit6 != "") 
    {
        if (bit6.size() == 8) //Until there is only 8 bit remaining (I don't know why exactly but this line is making the whole thing work)
        {
            break;
        };
        
        std::bitset<8> b(bit6.substr(0, 8));
        
        bit6.erase(0,8);
        
        //Add the 8-bit character to a result string
        homedecod += char(b.to_ulong());
    };
    //Print result
    std::cout << "string:" << homedecod << std::endl;
    
    homedecod += "\n";
    
    return homedecod;
}
Colostrum answered 26/1 at 15:14 Comment(1)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Amerigo

© 2022 - 2024 — McMap. All rights reserved.