Compile time text to number translation (atoi)
Asked Answered
P

1

8

I want to implement atoi() function at compile time (in C++ language, by using C++11 or C++14 standard). So it should be able to parse text enclosed in double quotes as number, or repor an error. More specifically, it is a part of bigger system, which is able to parse printf-like format at compile time. And I want to split format strings on words and if some particular word can be represented by number -- output number instead of the string (behind the scene is serializer class, which can serialize numbers more effectively, than the strings, and which is more important, deserializer then not should try to parse every string as a number, because all numebers printed inside format string is always represented as numbers, but not as strings)...

As I know two there can be two approaches to solve the task:

1) by using constexpr functions;

2) by template metaprogramming.

Which way can be better? I have tried first way, and I can see there is many obstacles in this way: especially few limitations related to c++11. Looks like second might be preferable, but it requires some tricks (you need to split c-string to separate chars by using of operator"", which supported in gcc starting from c++14, and in clangs starting from c++11). Also solution based completely on TMP might be too large and too tangled.

Below is my solution, I glad to hear some suggestions about it.

http://coliru.stacked-crooked.com/a/0b8f1fae9d9b714b


#include <stdio.h>

template <typename T> struct Result
{
    T value;
    bool valid;

    constexpr Result(T v) : value(v), valid(true) {}
    constexpr Result() : value(), valid(false) {}
};

template <typename T>
constexpr Result<T> _atoi_oct(const char *s, size_t n, T val, int sign)
{
    return n == 0 ? Result<T>(sign < 0 ? -val : val)
        : *s >= '0' && *s <= '7' 
            ? _atoi_oct(s+1, n-1, val*T(010) + *s - '0', sign)
            : Result<T>();
}

template <typename T>
constexpr Result<T> _atoi_dec(const char *s, size_t n, T val, int sign)
{
    return n == 0 ? Result<T>(sign < 0 ? -val : val)
        : *s >= '0' && *s <= '9'
            ? _atoi_dec(s+1, n-1, val*T(10) + *s - '0', sign)
            : Result<T>();
}

template <typename T>
constexpr Result<T> _atoi_hex(const char *s, size_t n, T val, int sign)
{
    return n == 0 ? Result<T>(sign < 0 ? -val : val)
        : *s >= '0' && *s <= '9'
            ? _atoi_hex(s+1, n-1, val*T(0x10) + *s - '0', sign)
            : *s >= 'a' && *s <= 'f'
                ? _atoi_hex(s+1, n-1, val*T(0x10) + *s - 'a' + 10, sign)
                : *s >= 'A' && *s <= 'F'
                    ? _atoi_hex(s+1, n-1, val*T(0x10) + *s - 'A' + 10, sign)
                    : Result<T>();
}

template <typename T>
constexpr Result<T> _atoi_zero(const char *s, size_t n, int sign = 1)
{
    return n == 0 ? Result<T>()
        : *s >= '0' && *s <= '7'
            ? _atoi_oct(s+1, n-1, T(*s - '0'), sign)
            : *s == 'x' || *s == 'X'
                ? _atoi_hex(s+1, n-1, T(0), sign)
                : Result<T>();
}

template <typename T>
constexpr Result<T> _atoi_sign(const char *s, size_t n, int sign = 1)
{
    return n == 0 ? Result<T>()
        : *s == '0'
            ? _atoi_zero<T>(s+1, n-1, sign)
            : *s > '0' && *s <= '9'
                ? _atoi_dec(s+1, n-1, T(*s - '0'), sign)
                : Result<T>();
}

template <typename T>
constexpr Result<T> _atoi_space(const char *s, size_t n)
{
    return n == 0 ? Result<T>()
        : (*s == ' ' || *s == '\t' || *s == '\n' || *s == '\r' || *s == '\v')
            ? _atoi_space<T>(s+1, n-1)
            : *s == '-'
                ? _atoi_sign<T>(s+1, n-1, -1)
                : *s == '+'
                    ? _atoi_sign<T>(s+1, n-1)
                    : *s == '0'
                        ? _atoi_zero<T>(s+1, n-1)
                        : _atoi_dec(s, n, T(0), 1);
}

template <size_t N> void pstr(const char (&s)[N])
{
    printf("s '%.*s'\n", int(N-1), s);
}

template <typename Str>
__attribute__((always_inline))
void _atoi(Str s)
{
    constexpr auto result = _atoi_space<long>(s.cstr(), sizeof(s.cstr())-1);
    if (result.valid)
        printf("i %ld\n", result.value);
    else
        pstr(reinterpret_cast<const char (&)[sizeof(s.cstr())]>(s.cstr()));
}

#define atoi(STR) _atoi([]() { \
                        struct S { \
                            static constexpr const char (&cstr())[sizeof(STR)] { return STR; } \
                        }; \
                        return S();  \
                    }())

int main()
{
    atoi("42");
    atoi("-1");
    atoi("+1");
    atoi("010");
    atoi("-0x10");
    atoi("--1");
    atoi("x");
    atoi("3x");
    return 0;   
}

Basically I want to ask, how can I transform at compile time number (like "42") written in double quotes in the value of integral type. My solution look too cumbersome.

Pyroconductivity answered 30/12, 2019 at 18:26 Comment(6)
Actually I’ve been looking for something like this. It may have its uses with hashing strings at compile time or sth. Also another approach could be compiler api programming but that would be compiler specific.Kay
C++17 is not an option right?Bluhm
So, what's wrong with your solution? And what do you want to know exactly? Better in what sense?Wallack
@MarcinPoloczek as you mentioned compile-time string hashing: perhaps you like this post I wrote some time ago here on SO: https://mcmap.net/q/16726/-conveniently-declaring-compile-time-strings-in-cWallack
Updated version, with error checking: coliru.stacked-crooked.com/a/9f67878a7be4310cPyroconductivity
1. Why do you need to do this? 2. Your solution does not work at compile-time, but at run-time. Your _atoi(Str s) function is not even constexpr. 3. Consider removing your solution and just keeping the question. 4. Is the C++14 solution you got unsatisfactory?Denary
B
1

With C++14 we can get rid of the macro and some ternary operators. Here is how I would do it, the code should be selfexplanatory (I've also added some comments). The code below can also be found here (with some examples) for compiler comparison.

#include <cstdint>
#include <iostream>

template <typename T>
struct Result
{
    T value{};
    bool valid = false;

    constexpr explicit Result(T v) : value(v), valid(true) {}
    constexpr Result() = default;
};

// converts upper case chars to lower case
constexpr char to_lower(char c)
{
    return c >= 'A' && c <= 'Z'
        ? c - 'A' + 'a'
        : c;
}

// converts a digit char to its numeric value (eg. 'F' -> 15)
constexpr int to_digit(char c)
{
    c = to_lower(c);
    return c >= 'a'
        ? c - 'a' + 10
        : c - '0';
}

// checks whether the given digit fits in the given base (eg. 'A' in 16 (hex) -> true, but '9' in 8 (oct) -> false)
constexpr bool is_digit(char c, int base)
{
    int digit = to_digit(c);
    return 0 <= digit && digit < base;
}

namespace detail
{
    // returns true if c is a sign character (+ or -), sign will hold a valid factor (1 or -1) regardless of the return value
    constexpr bool get_sign(char c, int& sign)
    {
        if (c == '-')
        {
            sign = -1;
            return true;
        }
        else
        {
            sign = 1;
            return c == '+';
        }
    }

    // adds a digit to the right side of the a number
    template <typename T>
    constexpr T append_digit(T value, int base, int digit)
    {
        return value * base + digit;
    }

    // create the actual number from the given string
    template <typename T>
    constexpr T construct_integral(const char* str, std::size_t size, int base)
    {
        T value = 0;
        for (std::size_t i = 0; i < size; i++)        
            value = append_digit(value, base, to_digit(str[i]));

        return value;
    }

    // how many chars are necessary to specify the base (ex. hex -> 0x -> 2) 
    constexpr std::size_t get_base_offset(int base)
    {
        if (base == 8) return 1;
        if (base == 16) return 2;
        return 0;
    }

    // gets the base value according to the number prefix (eg. 0x -> 16 (hex))
    constexpr int get_base(const char* str, std::size_t size)
    {
        return str[0] == '0'
            ? size > 2 && to_lower(str[1]) == 'x'
                ? 16
                : 8
            : 10;
    }

    // checks whether all digits in the string can fit in the given base
    constexpr bool verify_base(const char* str, std::size_t size, int base)
    {
        for (std::size_t i = 0; i < size; i++)
            if (!is_digit(str[i], base))
                return false;

        return true;
    }
}

template <typename T = int>
constexpr Result<T> to_integral(const char* str, std::size_t size)
{
    using namespace detail;

    // remove the sign from the string
    auto sign = 0;    
    if (get_sign(str[0], sign)) 
    {
        ++str;
        --size;
    }

    // get the base and remove its prefix from the string
    auto base = get_base(str, size);
    auto offset = get_base_offset(base);
    str += offset;
    size -= offset;

    // check if the string holds a valid number with respect to its base
    if (!verify_base(str, size, base))
        return {};

    // create the number and apply the sign
    auto unsigned_value = construct_integral<T>(str, size, base);
    return Result<T>(unsigned_value * sign);
}

template <typename T = int, std::size_t N>
constexpr Result<T> to_integral(const char(&str)[N])
{
    static_assert(N > 1, "Empty strings are not allowed");
    return to_integral<T>(str, N - 1);
}

C++17 could reduce the amount of code even more by using std::string_view. Your Result<T> could also be replaced by std::optional.

Bluhm answered 30/12, 2019 at 22:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.