Use string litterals in a "char type" templated class
Asked Answered
Y

4

8

I have a template class in C++ which takes as a char_type template parameter the character type, such as char, wchar_t, char32_t, etc... The class then use std::basic_string<char_type> in the code.

Then somewhere in the class I fill a table of escaping sequences such as "&amp;". This does not work as depending on the template character type, we would need to use "&amp;", L"&amp;", U"&amp;"...

Is there a way to avoid specializing the template functions for initializing the table, for instance with some standard function for converting string litterals?

As these are escaping sequences, they do not contain anything else than ASCII characters.

Yocum answered 29/9, 2015 at 9:22 Comment(17)
So you essentially need a polymorphic string literal? Mmmm...Subsistence
No, I want this at compile timeYocum
And I only want to convert 8bit char to other wider char formats, not the reverse operation.Yocum
C++ has compile-time polymorphism as well, but I'm not sure whether it's feasible here.Subsistence
Isn't compile time polymorphism called templates?Yocum
@templates No. Templates are a mechanism that allows for polymorphism, but I digress. Let me think about a possible solution for a while.Subsistence
I think I will try declaring a templated constexpr and fully specialize it for every char typeYocum
How do you want to deal with the fact that u8"A" and "A" are indistinguishable by type, but not necessarily the same value? Before this can be done you need to think properly about your requirements. (A is merely an example, picked for familiarity. There are maybe about one hundred other examples)Victorie
@galinette: Compile time polymorphism is still polymorphism.Trehalose
Don't know why this has a downvote. It's a good question and (surprisingly) I can't find a dupe.Trehalose
If this is only used for some table, just use macros and go home.Victorie
@R.MartinhoFernandes : macros can't be used here as this has to work in a template, the template parameter being the character typeYocum
@R.MartinhoFernandes I made this mistake in my answer...Arlie
@R.MartinhoFernandes : if you use proper encoding for your source file (UTF-8), "Ἆ" and u8"Ἆ" have the same value.Yocum
@R.MartinhoFernandes Requirements addedYocum
@Yocum Good. But FWIW, notice how my example also uses only ASCII characters (though granted, it requires somewhat unusual environments/compiler settings)Victorie
Also, as another nitpick, your claim about the encoding of your source file is not correct. The conditions under which "A" will not be the same as u8"A" are indifferent to the source code encoding (see, e.g., GCC's -fexec-charset flag).Victorie
U
4

I would do the following:

template <typename char_type, size_t LENGTH>
constexpr std::basic_string<char_type> literal(const char (&value)[LENGTH])
{
    using string = std::basic_string<char_type>;

    string result{};
    result.reserve(LENGTH);

    std::copy(std::begin(value), std::end(value), std::back_inserter(result));

    return result; // rvo
}

You can use it this way:

// Table of escaping sequences
std::basic_string<char_type> escaping_sequences[] =
{
    literal<char_type>("&amp"),
    literal<char_type>("&foo"),
    literal<char_type>("&bar"),
    ...
}

I've tested it in Ideone:

literal<  char  >("test") // result: std::string
literal<char32_t>("test") // result: std::basic_string<char32_t, std::char_traits<char32_t>, std::allocator<char32_t> >
literal<char16_t>("test") // result: std::basic_string<char16_t, std::char_traits<char16_t>, std::allocator<char16_t> >

Is untested for all the char types but hope it helps.

Edit 1

My bad, I just noticed that galinette almost answered the same as me before I did. The only difference between my code and the one from galinette is that I'm allocating the resulting string once with reserve instead of using the automatic allocation of push_back counting the number of characters at compile time, due to the use of LENGTH as a template parameter.

Edit 2

It is possible to avoid the final null character issue by substracting 1 to the end iterator:

template <typename char_type, size_t LENGTH>
constexpr std::basic_string<char_type> literal(const char (&value)[LENGTH])
{
    using string = std::basic_string<char_type>;

    string result{};
    result.reserve(LENGTH - 1);

    std::copy(std::begin(value), std::end(value) - 1, std::back_inserter(result));

    return result; // rvo
}

Or, using std::copy_n instead of std::copy:

template <typename char_type, size_t LENGTH>
constexpr std::basic_string<char_type> literal(const char (&value)[LENGTH])
{
    using string = std::basic_string<char_type>;

    string result{};
    result.reserve(LENGTH - 1);

    std::copy_n(std::begin(value), LENGTH - 1, std::back_inserter(result));

    return result; // rvo
}
Underworld answered 29/9, 2015 at 13:20 Comment(9)
I also used reserve! But your solution is better as it does not count the number of characters at run time, due to the use of LENGTH as a template parameter which is a good idea.Yocum
@Yocum what a day I'm having... you're right, you also are using reserve but I am having a day full of missreads :'(Underworld
Many thanks for this elegant solution! Just a minor comment after using this code: I ran into the issue that the above code also includes the final null character into the result (i.e. try to evaluate literal<char>("test").length(), this gives 5 instead of the expected 4). I'm not sure what's the best way to solve this. The solution by @Yocum does not suffer from this problem as in that code, the variable s is set to the length excluding the null character.Mordy
@MatthiasC.M.Troffaes it should be easy to fix this null character issue reserving LENGTH - 1 characters and using std::copy_n instead of std::copy (I'll edit the answer).Underworld
That's brilliant! Also many thanks for the quick response - wasn't expecting that on such on old answer!!Mordy
@MatthiasC.M.Troffaes notifications exists for this reason :)Underworld
Is this a valid use of constexpr? I get compiler errors ("'result' declaration is not allowed in 'constexpr' function body), but I'm not sure if this was added in a later standard.Theocracy
@BTownTKD Which compiler and C++ standard version you're using?Underworld
MSVC 2015, which apparently has "some" c++14 features, but not "Extended constexpr." Perhaps that's why. msdn.microsoft.com/en-us/library/hh567368.aspxTheocracy
Y
2

The best way is maybe to define conversion function ourselves, as converting ASCII to UTF8/16/32 is a straightforward cast on the char types

template<typename char_type>
std::basic_string<char_type> cvtASCIItoUTFX(const char * litteral)
{
    //We could define a faster specialization in case char_type is char

    size_t s = strlen(litteral);

    std::basic_string<char_type> result;
    result.reserve(s);
    for(size_t i=0;i<s;++i)
    {
        result.push_back((char_type)litteral[i]);
    }

    return result;
}
Yocum answered 29/9, 2015 at 10:49 Comment(0)
V
1

As these are escaping sequences, they do not contain anything else than ASCII characters.

Is there a way to avoid specializing the template functions for initializing the table, for instance with some standard function for converting string litterals?

No, because the standard doesn't have any conversion functions that stick to such specific subsets.

I'd recommend just using an external generator for the table, or if you really want to stay within C++, to use macros.

Victorie answered 29/9, 2015 at 10:15 Comment(2)
Since an ASCII string is a valid UTF8-string, and since c++11 has utf8 to utf16 conversions, that's not completely true.Yocum
If you read my claim properly, you'll see that it makes no sense to use a superset to refute it. But that aside, there are not enough conversion functions for this in the standard. You'll have to do it by hand.Victorie
S
0

This answer only works for non-string (i.e. number) literals

... because only those are expanded to template<char...> by the language.

Since I've spent a while on this, I figured I might as well post it here. Doesn't work with actual character literals because herp derp C++.

template<char16_t... str>
struct Literal16 {
    static constexpr char16_t arr[] = {str...};
    
    constexpr operator const char16_t*() { 
        return arr;
    }
};

template<char... str>
struct Literal8 {
    static constexpr char arr[] = {str...};
    
    constexpr operator const char*() { 
        return arr;
    }
};

template<char... str>
struct PolyLiteral {
    operator const char*() {
        return Literal8<str...>();
    }
    operator const char16_t*() {
        return Literal16<str...>();
    }  
};

template<char... str> PolyLiteral<str...> operator"" _poly() { return PolyLiteral<str...>(); }

int main() {
    const char* test = 123_poly;
    const char16_t* test2 = 123_poly;
}
Subsistence answered 29/9, 2015 at 10:25 Comment(1)
Doh, that's a nice operator hacking... This might make the code hard to read, though.Yocum

© 2022 - 2024 — McMap. All rights reserved.