I think your problem is just a misunderstanding.
Prelude> print "Ёжик лижет мёд."
"\1025\1078\1080\1082 \1083\1080\1078\1077\1090 \1084\1105\1076."
Prelude> putStrLn "\1025\1078\1080\1082 \1083\1080\1078\1077\1090 \1084\1105\1076."
Ёжик лижет мёд.
Prelude> "{\"a\": \"Ёжик лижет мёд.\"}"
"{\"a\": \"\1025\1078\1080\1082 \1083\1080\1078\1077\1090 \1084\1105\1076.\"}"
When you print
a value containing a String
, the Show
instance for Char
is used, and that escapes all characters with code points above 127. To get the glyphs you want, you need to putStr[Ln]
the String
.
So aeson
properly decoded the utf8-encoded input, as should be expected because it utf8-encodes the values itself:
encode = {-# SCC "encode" #-} encodeUtf8 . toLazyText . fromValue .
{-# SCC "toJSON" #-} toJSON
So to the question why aeson
uses ByteString
and not Text
for the final target of encoding and starting point of decoding.
Because that is the appropriate type. The encoded values are intended to be transferred portably between machines. That happens as a stream of bytes (octets, if we're in pedantic mood). That is exactly what a ByteString
provides, a sequence of bytes that then have to be treated in an application-specific way. For the purposes of aeson
, the stream of bytes shall be encoded in utf-8, and aeson
assumes the input of the decode
function is valid utf-8, and encodes its output as valid utf-8.
Transferring e.g. Text
would run into portability problems, since a 16-bit encoding depends on endianness, so Text
is not an appropriate format for interchange of data between machines. Note that aeson
uses Text
as an intermediate type when encoding (and presumably also when decoding), because that is an appropriate type to use at intermediate stages.