Using Haskell to output a UTF-8-encoded ByteString
Asked Answered
S

3

27

I'm going out of my mind trying to simply output UTF-8-encoded data to the console.

I've managed to accomplish this using String, but now I'd like to do the same with ByteString. Is there a nice and fast way to do this?

This is what I've got so far, and it's not working:

import Prelude hiding (putStr)
import Data.ByteString.Char8 (putStr, pack)

main :: IO ()
main = putStr $ pack "čušpajž日本語"

It prints out uapaj~�,�, ugh.

I'd like an answer for the newest GHC 6.12.1 best, although I'd like to hear answers for previous versions as well.

Thanks!

Update: Simply reading and outputting the same UTF-8-encoded line of text seems to work correctly. (Using Data.ByteString.Char8, I just do a putStr =<< getLine.) But packed values from inside the .hs file, as in the above example, refuse to output properly... I must be doing something wrong?

Swinge answered 18/1, 2010 at 14:57 Comment(3)
What platform are you on? Unicode on UNIX-like platforms works quite well now; Windows support is lagging a bit. See the documentation for System.IO: "(GHC note: on Windows, we currently do not support double-byte encodings; if the console's code page is unsupported, then localeEncoding will be latin1.)"Margravine
64-bit Linux. Doesn't System.IO work only with String?Swinge
You should not use BS.Char8, because that one assumes 8-Bit encoding and truncates multi-byte Unicode characters. Use normal ByteStrings unless you absolutely know that BS.Char8 is the right data type (that includes knowing why normal ByteStrings are explicitly not the right type for that use case).Malapropism
A
27

utf8-string supports bytestrings.

import Prelude hiding (putStr)
import Data.ByteString.Char8 (putStr)
import Data.ByteString.UTF8 (fromString)

main :: IO ()
main = putStr $ fromString "čušpajž日本語"
Awildaawkward answered 18/1, 2010 at 21:3 Comment(0)
C
21

bytestrings are strings of bytes. When they're output, they will be truncated to 8 bits, as it describes in the documentation for Data.ByteString.Char8. You'll need to explicitly convert them to utf8 - via the utf8-string package on Hackage, which contains support for bytestrings.


However, as of 2011, you should use the text package, for fast, packed unicode output. GHC truncating Unicode character output

Your example becomes a lot simpler:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text    as T
import qualified Data.Text.IO as T

main = T.putStrLn "čušpajž日本語"

Like so:

$ runhaskell A.hs
čušpajž日本語
Cryostat answered 18/1, 2010 at 17:20 Comment(2)
Doesn't utf8-string work only with Strings, and not ByteStrings?Swinge
No, it also works with bytestrings. See #2087342Cryostat
P
-2

This is a known ghc bug, marked "wontfix".

Paratuberculosis answered 18/1, 2010 at 16:31 Comment(3)
Noooooooo. :( But, I'm puzzled... it seems to work fine with regular Strings?Swinge
Whatever this is, it's fixed now. Executing the example given on your linked page works as expected. The difference is that I'm trying to output UTF-8-encoded ByteStrings, and not UTF-8-encoded Strings, which is supposed to be more efficient. Keep in mind I'm currently using GHC 6.12.1, although I know the problem doesn't exist in GHC 6.10.4 either.Swinge
No, that's not actually the problem. GHC 6.12 does utf8 String IO, if the locale is set to that. Which in fact solves the above bug, which isn't the problem the OP is asking about.Cryostat

© 2022 - 2024 — McMap. All rights reserved.