Compile-Time Base64 Decoding in C++
Asked Answered
G

2

8

Is it possible to decode base64 encoded data to binary data at compile-time?

I think of something that looks like this:

constexpr auto decoded = decodeBase64<"SGVsbG8=">();

or

constexpr auto decoded = decodeBase64("SGVsbG8=");

I have no special requirements fo the resulting type of decoded.

Goerke answered 4/1, 2020 at 19:56 Comment(3)
constexpr auto decoded = decodeBase64<"SGVsbG8=">(); - no, const char[] cannot be a non-type template parameter as of C++17. constexpr auto decoded = decodeBase64("SGVsbG8="); - yes, if decodeBase64 takes const char* and is a constexpr function.Hankering
Just try making a simple decoder that takes the string as a regular argument, and put constexpr in front of it. It should work. If you run into more specific problems, ask again on StackOverflow.Palmirapalmistry
@Fureeish: It’s not that you can’t have a template parameter of that type (adjusted to const char* or via a pointer or reference to an array); you just can’t use a string literal as a template argument for it.Retool
Z
7

I found it surprisingly hard to google for a constexpr base64 decoder, so I adapted the one here: https://gist.github.com/tomykaira/f0fd86b6c73063283afe550bc5d77594

Since that's MIT licensed, (sigh), be sure to slap this somewhere in the source file:

/**
 * The MIT License (MIT)
 * Copyright (c) 2016 tomykaira
 *
 * Permission is hereby granted, free of charge, to any person obtaining
 * a copy of this software and associated documentation files (the
 * "Software"), to deal in the Software without restriction, including
 * without limitation the rights to use, copy, modify, merge, publish,
 * distribute, sublicense, and/or sell copies of the Software, and to
 * permit persons to whom the Software is furnished to do so, subject to
 * the following conditions:
 *
 * The above copyright notice and this permission notice shall be
 * included in all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 * LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 * OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
 * WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 */

To return a string from a constexpr function, you need to return a char array. Because you can't return an array or std::string, an std::array is the best option. But there is a problem - due to a standards oversight, until C++17 the [] operator of std::array is non-const. You can work around that by inheriting and adding a constructor though:

template <size_t N>
struct fixed_string : std::array<char, N> {
    constexpr fixed_string(const char (&input)[N]) : fixed_string(input, std::make_index_sequence<N>{}) {}
    template <size_t... Is>
    constexpr fixed_string(const char (&input)[N], std::index_sequence<Is...>) : std::array<char, N>{ input[Is]... } {}
};

Change the decoder to use that instead of std::string, and it seems to work as constexpr. Requires C++14 because C++11 constexpr functions can only have one return statement:

template <size_t N>
constexpr const std::array<char, ((((N-1) >> 2) * 3) + 1)> decode(const char(&input)[N]) {
    constexpr unsigned char kDecodingTable[] = {
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 62, 64, 64, 64, 63,
        52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64, 64, 64, 64, 64, 64,
        64,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14,
        15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 64, 64, 64, 64, 64,
        64, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
        41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
        64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
    };

    static_assert(((N-1) & 3) == 0, "Input data size is not a multiple of 4");

    char out[(((N-1) >> 2) * 3) + 1] {0};

    size_t out_len = (N-1) / 4 * 3;
    if (input[(N-1) - 1] == '=') out_len--;
    if (input[(N-1) - 2] == '=') out_len--;

    for (size_t i = 0, j = 0; i < N-1;) {
      uint32_t a = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t b = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t c = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];
      uint32_t d = input[i] == '=' ? 0 & i++ : kDecodingTable[static_cast<int>(input[i++])];

      uint32_t triple = (a << 3 * 6) + (b << 2 * 6) + (c << 1 * 6) + (d << 0 * 6);

      if (j < out_len) out[j++] = (triple >> 2 * 8) & 0xFF;
      if (j < out_len) out[j++] = (triple >> 1 * 8) & 0xFF;
      if (j < out_len) out[j++] = (triple >> 0 * 8) & 0xFF;
    }
    return fixed_string<(((N-1) >> 2) * 3) + 1>(out);
}

Usage:

constexpr auto x = decode("aGVsbG8gd29ybGQ=");
/*...*/
printf(x.data()); // hello world

Demo: https://godbolt.org/z/HFdk6Z

updated to address helpful feedback from Marek R and Frank

Zawde answered 5/1, 2020 at 19:32 Comment(6)
IMO return value should be std::array not a custom class. It also should be point out that this code requires c++14.Christianechristiania
Really nice, but I'm not sure if the size of the data in fixed_string is correct. It does not take into account if there is none, one or two '=' padding characters.Goerke
@MarekR for whatever reason, until C++17, std::array's index operator is non-const. You're right though, that would be much cleaner if you're using C++17 or above.Zawde
@Goerke Good point about the output length! I'll update the answer, but it'll still have to allocate the buffer using the simpler calculation because template arguments can't use function parameter content (e.g. the string).Zawde
@DavisHerring Ideally, I would declare an std::array, use it, and return it, but the restriction means I have to build the output in a C array and copy that into an std::array. std::array doesn't have a constructor for this, or any user constructor, rather it's aggregate-initialized like a C array. So all that the fixed_string helper class does now is add a constructor that transfers the contents of the C array to a brace-initializer for the std::array The constexpr function still outputs an std::array though.Zawde
@parktomatomi: Right, of course—I might have used a helper function rather than a class, even though it does convert back as you said.Retool
G
1

parktomatomi's answer helped a lot to find this solution. Using C++17 and std::array this seems to work.

The base64 decoder is based on the answer https://mcmap.net/q/151854/-base64-decode-snippet-in-c

constexpr size_t decodeBase64Length(const char *s)
{
    size_t len = std::char_traits<char>::length(s);
    if (s[len - 2] == '=')
        return (len / 4) * 3 - 2;
    else if(s[len -1] == '=')
        return (len / 4) * 3 - 1;
    else
        return (len / 4) * 3 ;
}

constexpr std::array<int, 256> prepareBase64DecodeTable() {
    std::array<int, 256> T{ 0 }; // breaks constexpr: T.fill(-1) or missing initialization
    for (int i = 0; i < 256; i++)
        T[i] = -1;
    for (int i = 0; i < 64; i++)
        T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;
    return T;
}

// based on https://mcmap.net/q/151854/-base64-decode-snippet-in-c
template<int N>
constexpr std::array<std::byte, N> decodeBase64(const char *b64Str)
{
    constexpr auto T = prepareBase64DecodeTable();
    std::array<std::byte, N> out = { std::byte(0) };
    int valb = -8;
    for (size_t i = 0, val = 0, posOut = 0; i < std::char_traits<char>::length(b64Str) && T[b64Str[i]] != -1; i++) {
        val = (val << 6) + T[b64Str[i]];
        valb += 6;
        if (valb >= 0) {
            out[posOut++] = std::byte((val >> valb) & 0xFF);
            valb -= 8;
        }
    } 
    return out;
}

Usage is not perfect as I can not deduce the length of the resulting array without passing it explicitly as template parameter:

#define B64c "SGVsbG8xMg=="
constexpr auto b64 = decodeBase64<decodeBase64Length(B64c)>(B64c);  // array<byte,7>

Demo at https://godbolt.org/z/-DX2-m

Goerke answered 5/1, 2020 at 22:54 Comment(4)
Instead of const char *, use a reference to an array to deduce the length from the argument: const char (&b64Str)[N].Zawde
@Zawde the length of the argument can be deduced, but the length of the output cannot be deduced this way.Oxytocin
If you're using macro's you might as well do this: #define DECODE_B64(b64) decodeBase64<decodeBase64Length(b64)>(b64) and then: constexpr auto b64c = DECODE_B64("SGVsbG8xMg==");Oxytocin
I used your code in my answer: https://mcmap.net/q/1468607/-compile-time-initialization-of-uint8_t-array-with-base64-string-literalOxytocin

© 2022 - 2024 — McMap. All rights reserved.