compile time initialization of uint8_t array with base64 string literal
Asked Answered
S

1

1

I have a source file with a large byte array representing an image.

Below an example (in reality it can be any random data):

const uint8_t image_test_image[] ={
    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 
    0x0C, 0x0D, 0x0E, 0x0F, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 
    0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F, 0x20, 0x21, 0x22, 0x23, 
    0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x2D, 0x2E, 0x2F, 
    0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3A, 0x3B, 
    0x3C, 0x3D, 0x3E, 0x3F, 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
    ...
);

I have several of these source files and they take up a lot of space on my disk. I could remove the white space, but that would only save a little space. I want to try 2 different ways of initializing this array at compile time to reduce source file size.

One way is to use base64 strings:

const uint8_t image_test_image[] =
base64decode(
  "AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKissLS4vMDEyMzQ1Njc4OT"
  "o7PD0+P0BBQkNERUZH"...
);

I think it should be possible to make a C++ header-only constexpr base64 decoder that takes string literals. An example here: https://mcmap.net/q/1335860/-compile-time-base64-decoding-in-c . But that one requires you to explicitly add size as parameter. I don't want that. Is it possible to deduce a template parameter value from an argument value?

Another way is to use an uint64_t array or uint64_t initializer list as an input instead. Choosing a certain endianness and taking into account the byte array may not be a multiple of 8 bytes.

I'm using C++14 now, but C++17 is also possible. Even C++20 might be possible in the future of this project. It is ok to use std::array.

Edit2:

uint64 array works in c++14:

#include <stdio.h>
#include <stdint.h>
#include <array>

//#define LITTLE_ENDIAN

template <typename... T>
constexpr std::array<uint8_t, sizeof...(T)*8> u64_array_to_u8_array(T&&... t) 
{
    std::array<uint8_t, sizeof...(T)*8> out{};
    std::array<int64_t, sizeof...(T)> in = {t...};

    for (size_t i = 0; i < sizeof...(T)*8; ++ i) {
        #ifdef LITTLE_ENDIAN
        out[i] = uint64_t(in[i/8]) >> ((i%8)*8);//little endian
        #else
        out[i] = uint64_t(in[i/8]) >> ((7-i%8)*8);//big endian
        #endif
    }
    return out;
}

#ifdef LITTLE_ENDIAN
//little little
const auto image_test_image = u64_array_to_u8_array(
    0x0706050403020100,0x0F0E0D0C0B0A0908,
    0x1716151413121110,0x1F1E1D1C1B1A1918);
#else
//big endian
const auto image_test_image = u64_array_to_u8_array(
    0x0001020304050607,0x08090A0B0C0D0E0F,
    0x1011121314151617,0x18191A1B1C1D1E1F
);
#endif

int main() {
    for (size_t i = 0; i < sizeof(image_test_image); ++i)
    {
        printf("0x%02X, ", image_test_image[i]);
        if ((i%16==15) && i != sizeof(image_test_image))
        {
            puts("");
        }
    }
    
    return 0;
}

Now I just need to know how to implement constexpr base64 decoding with a string literal.

Spelt answered 26/9 at 14:55 Comment(4)
What people often do is just keep the file in binary, and have a script or tool to convert it to source on the fly at compile time, then delete the temporary source file. Or, keep the source file compressed. Trying to use language features to solve this seems like the wrong way to go.Karnes
Depending on your tool chain, or willingness to use external tools, there can also be ways to link binary data into your program without making C++ source first. For instance, the gnu assembler has the .incbin directive: sourceware.org/binutils/docs/as/Incbin.htmlKarnes
I am aware you can use build scripts. I have used cmake to do this in the past. The problem is my project can be build with different compilers and tools and I don't want to create different versions of build scripts or use third party libraries or tools. A few days ago I actually looked at that exact library you just shared. I simply want more compact source files so I'm looking for ways to do it in C++.Spelt
Also the ability to convert strings to other types of objects at compile time is very useful in general.Spelt
S
3

Base64 string literal to byte array at compile time

c++17 (uses a macro)

#include <array>
#include <string> // for std::char_traits
    
//based on https://mcmap.net/q/1335860/-compile-time-base64-decoding-in-c

constexpr size_t decodeBase64Length(const char *s)
{
    size_t len = std::char_traits<char>::length(s);
    if (s[len - 2] == '=')
        return (len / 4) * 3 - 2;
    else if(s[len -1] == '=')
        return (len / 4) * 3 - 1;
    else
        return (len / 4) * 3 ;
}

constexpr std::array<int, 256> prepareBase64DecodeTable() {
    std::array<int, 256> T{ 0 }; // breaks constexpr: T.fill(-1) or missing initialization
    for (int i = 0; i < 256; i++)
        T[i] = -1;
    for (int i = 0; i < 64; i++)
        T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;
    return T;
}

// based on https://mcmap.net/q/151854/-base64-decode-snippet-in-c
template<int N>
constexpr std::array<std::byte, N> decodeBase64(const char *b64Str)
{
    constexpr auto T = prepareBase64DecodeTable();
    std::array<std::byte, N> out = { std::byte(0) };
    int valb = -8;
    for (size_t i = 0, val = 0, posOut = 0; i < std::char_traits<char>::length(b64Str) && T[b64Str[i]] != -1; i++) {
        val = (val << 6) + T[b64Str[i]];
        valb += 6;
        if (valb >= 0) {
            out[posOut++] = std::byte((val >> valb) & 0xFF);
            valb -= 8;
        }
    } 
    return out;
}

//added macro to allow string literal in to be used inline without a define
#define DECODE_B64(b64) decodeBase64<decodeBase64Length(b64)>(b64) 

usage:

constexpr auto b64a = DECODE_B64("SGVsbG8=");

test here: https://godbolt.org/z/Esx77W1cG

c++20 (uses custom string literal)

#include <array>
#include <string> // for std::char_traits
#include <algorithm> // for std::copy_n

//from https://yongweiwu.wordpress.com/2022/06/19/compile-time-strings/
template <size_t N>
struct compile_time_string {
    constexpr compile_time_string(const char (&str)[N])
    {
        std::copy_n(str, N, value);
    }
    char value[N]{};
}; 
template <compile_time_string cts>
constexpr auto operator""_cts()
{
    return cts;
}

//based on https://mcmap.net/q/1335860/-compile-time-base64-decoding-in-c
        
constexpr size_t decodeBase64Length(const char *s)
{
    size_t len = std::char_traits<char>::length(s);
    if (s[len - 2] == '=')
        return (len / 4) * 3 - 2;
    else if(s[len -1] == '=')
        return (len / 4) * 3 - 1;
    else
        return (len / 4) * 3 ;
}

constexpr size_t decodeBase64Length(std::string_view s)
{
    size_t len = s.length();
    if (s[len - 2] == '=')
        return (len / 4) * 3 - 2;
    else if(s[len -1] == '=')
        return (len / 4) * 3 - 1;
    else
        return (len / 4) * 3 ;
}


constexpr std::array<int, 256> prepareBase64DecodeTable() {
    std::array<int, 256> T{ 0 }; // breaks constexpr: T.fill(-1) or missing initialization
    for (int i = 0; i < 256; i++)
        T[i] = -1;
    for (int i = 0; i < 64; i++)
        T["ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"[i]] = i;
    return T;
}

// based on https://mcmap.net/q/151854/-base64-decode-snippet-in-c
//template <size_t N>
template<auto s>
constexpr std::array<std::byte, decodeBase64Length(s.value)> decodeBase64()
{
    constexpr auto T = prepareBase64DecodeTable();
    std::array<std::byte, decodeBase64Length(s.value)> out = { std::byte(0) };
    int valb = -8;
    for (size_t i = 0, val = 0, posOut = 0; i < std::char_traits<char>::length(s.value) && T[s.value[i]] != -1; i++) {
        val = (val << 6) + T[s.value[i]];
        valb += 6;
        if (valb >= 0) {
            out[posOut++] = std::byte((val >> valb) & 0xFF);
            valb -= 8;
        }
    } 
    return out;
}

template <compile_time_string b64>
constexpr auto operator""_b64()
{
    return decodeBase64<b64>();
}

usage:

constexpr auto b64a = decodeBase64<"SGVsbG8="_cts>();

or

constexpr auto b64c = "SGVsbG8xMg=="_b64;
constexpr auto b64d = "AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygp"
                      "KissLS4vMDEyMzQ1Njc4OTo7PD0+P0BBQkNERUZH"_b64;

Test here: https://godbolt.org/z/YPc5fEaKn

Spelt answered 1/10 at 9:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.