Difference between Data.ByteString and Data.ByteString.Char8
Asked Answered
E

1

6

I read that Char8 only supports ASCII characters and will be dangerous to use if you are using other Unicode characters

{-# LANGUAGE OverloadedStrings #-}

--import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: BC.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  BC.writeFile "test.json" nameB
  putStrLn "done"

produces the same result as

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString as B
--import qualified Data.ByteString.Char8 as BC
import qualified Data.Text.IO as TIO
import qualified Data.Text.Encoding as E
import qualified Data.Text as T

name :: T.Text
name = "{ \"name\": \"哈时刻\" }"

nameB :: B.ByteString
nameB = E.encodeUtf8 name

main :: IO ()
main = do
  B.writeFile "test.json" nameB
  putStrLn "done"

So what is the difference of using Data.ByteString.Char8 vs Data.ByteString

Entozoic answered 23/11, 2017 at 2:19 Comment(1)
Notice your two programs are actually identical. The type BC.ByteString is a re-export of Data.ByteString.ByteString which you use as B.ByteString - so these are literally referring to the same type and all the code is identical so...Predominance
H
8

If you compare Data.ByteString and Data.ByteString.Char8, you'll notice that a bunch of functions that reference Word8 in the former reference Char in the latter.

-- Data.ByteString
map :: (Word8 -> Word8) -> ByteString -> ByteString
cons :: Word8 -> ByteString -> ByteString
snoc :: ByteString -> Word8 -> ByteString
head :: ByteString -> Word8
uncons :: ByteString -> Maybe (Word8, ByteString) 
{- and so on... -}


-- Data.ByteString.Char8
map :: (Char -> Char) -> ByteString -> ByteString
cons :: Char -> ByteString -> ByteString
snoc :: ByteString -> Char -> ByteString
head :: ByteString -> Char
uncons :: ByteString -> Maybe (Char, ByteString) 
{- and so on... -}

For these functions, and these functions only, Data.ByteString.Char8 is providing the convenience of not have to constantly convert Word8 values into and out of Char ones. writeFile does exactly the same thing in both modules.

Here is a nice way of seeing the different behaviours of similar functions in Text, ByteString, and ByteString.Char8:

{-# LANGUAGE OverloadedStrings #-}

import Data.Text.Encoding

import qualified Data.Text as T
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC

nameText :: T.Text
nameText = "哈时刻"

nameByteString :: B.ByteString
nameByteString = encodeUtf8 nameText

main :: IO ()
main = do
  print $ T.head nameText               -- '\21704'     actual first character
  print $ B.head nameByteString         -- 229          first byte
  print $ BC.head nameByteString        -- '\299'       first byte as character

  putStrLn [ T.head nameText ]          -- 哈           actual first character
  putStrLn [ BC.head nameByteString ]   -- å            first byte as character
Hashim answered 23/11, 2017 at 2:42 Comment(3)
Thank you so much for spending time to show me the differences. It seems that we should stick to T.Text if we do not want to lose the context of unicode characters. So what is the use case of ByteString then?Entozoic
@Entozoic ByteString is for (surprise!) strings of bytes. Operations like file reads/writes, network communication, most cryptographic operations, and so on are typically most naturally represented in terms of ByteString. Operations on text -- like preparing HTML for a web page, modifying text documents, build UI -- should use Text. Occasionally it is necessary to convert between the two, e.g. to send some of your HTML along the network or to display your UI on stdout; in such cases you must choose an encoding to convert correctly.Roundabout
@Entozoic ByteString is for raw binary data. Text is for... well... text. It depends on whether you're trying to deal with textual data, or just raw binary files / network protocols / etc.Quotable

© 2022 - 2025 — McMap. All rights reserved.