I'm trying to parse LaTeX escape codes (e.g. \alpha
) to the Unicode (Mathematical) characters (i.e. U+1D6FC
).
Right now this means I am using this symbols
parser (rule):
struct greek_lower_case_letters_ : x3::symbols<char32_t>
{
greek_lower_case_letters_::greek_lower_case_letters_()
{
add("alpha", U'\u03B1');
}
} greek_lower_case_letter;
This works fine but means I'm getting a std::u32string
as a result.
I'd like an elegant way to keep the Unicode code points in the code (for maybe future automation) and maintenance reasons. Is there a way to get this kind of parser to parse into a UTF-8 std::string
?
I thought of making the symbols
struct parse to a std::string
, but that would be highly inefficient (I know, premature optimization bla bla).
I was hoping there was some elegant way instead of going through a bunch of hoops to get this working (symbols
appending strings to the result).
I do fear though that using the code point values and wanting UTF8 will incur a runtime cost of the conversion (or is there a constexpr
UTF32->UTF8 conversion possibe?).
std::string
as symbol key/value, and I'm trying to get thechar_
rule to work as a sequence using therepeat
directive. Comparison of the UTF8 and UTF32 version here. I don't understand why the second version fails after the first\alpha
. – Illuse