Store UTF-8 encoding of a String in a ByteString
Asked Answered
S

1

8

So I want to access the individual bytes of the UTF-8 encoding of a string.

I tried using Data.ByteString.Char8.pack, but that seems to just truncate it to the last byte of each character:

ghci> Char8.pack "\945\946\947" 
"\177\178\179"

This isn't a problem if I can read the string from a file:

ghci> Prelude.writeFile "temp.txt" "\945\946\947" >> Char8.readFile "temp.txt"
"\206\177\206\178\206\179"

But I'd like a pure way to convert String -> ByteString without truncation, and hoogle isn't very helpful.

Silva answered 26/12, 2012 at 22:44 Comment(1)
I remembered reading something about hayoo including more packages in its search than hoogle, so I tried your search there, and it gave me the right answer as the second result:Oilbird
M
13

You can use Data.ByteString.UTF8.fromString:

ghci> import Data.ByteString.UTF8 as BSUTF8
ghci> :t BSUTF8.fromString
BSUTF8.fromString :: String -> ByteString
ghci> BSUTF8.fromString "\945\946\947"
"\206\177\206\178\206\179"

Alternatively, you can use encode{Strict,Lazy}ByteString from the encoding package, which offers a lot more encodings than just UTF-8:

ghci> import Data.Encoding as E
ghci> import Data.Encoding.UTF8
ghci> E.encodeStrictByteString UTF8 "\945\946\947"
"\206\177\206\178\206\179"
Milner answered 26/12, 2012 at 23:26 Comment(1)
The encoding package is rather nice! I wish it didn't have so many high-level dependencies...Lindstrom

© 2022 - 2024 — McMap. All rights reserved.