Haskell How to Create a Word8?
Asked Answered
O

1

15

I want to write a simple function which splits a ByteString into [ByteString] using '\n' as the delimiter. My attempt:

import Data.ByteString

listize :: ByteString -> [ByteString]
listize xs = Data.ByteString.splitWith (=='\n') xs

This throws an error because '\n' is a Char rather than a Word8, which is what Data.ByteString.splitWith is expecting.

How do I turn this simple character into a Word8 that ByteString will play with?

Oxidation answered 23/1, 2012 at 1:47 Comment(0)
E
17

You could just use the numeric literal 10, but if you want to convert the character literal you can use fromIntegral (ord '\n') (the fromIntegral is required to convert the Int that ord returns into a Word8). You'll have to import Data.Char for ord.

You could also import Data.ByteString.Char8, which offers functions for using Char instead of Word8 on the same ByteString data type. (Indeed, it has a lines function that does exactly what you want.) However, this is generally not recommended, as ByteStrings don't store Unicode codepoints (which is what Char represents) but instead raw octets (i.e. Word8s).

If you're processing textual data, you should consider using Text instead of ByteString.

Endearment answered 23/1, 2012 at 1:48 Comment(6)
Oh, wow. Excellent. I will have to dig into character representations, I guess. I have no idea what the numerical literals for the characters are. Is there a list of them somewhere?Oxidation
I am writing a program that will parse protein database files, which contain strings, integers, and doubles. The strings will mostly be used to identify the right items out of a list, whereas the ints and doubles will be used in math operations. I am not sure what class I should use for this.Oxidation
You could use ord in GHCi to find out the codepoint numbers of characters :) I generally get Unicode data from fileformat.info; the Basic Latin block contains the 128 codepoints inherited from ASCII.Endearment
As for the appropriate type for your program, it depends on the specific format and what you're doing, but if they don't contain any binary data, then Text would work fine. However, if the strings are always pure ASCII, and you're processing a large amount of data, then ByteString is likely to be faster.Endearment
Yes, the files are strictly ASCII, and performance is the goal. Thank you.Oxidation
How do I create a Word8 now?Tharp

© 2022 - 2024 — McMap. All rights reserved.