Haskell How to Create a Word8?

About

Asked 23/1, 2012 at 1:47 Answered 23/1, 2012 at 1:48

I want to write a simple function which splits a ByteString into [ByteString] using '\n' as the delimiter. My attempt:

import Data.ByteString

listize :: ByteString -> [ByteString]
listize xs = Data.ByteString.splitWith (=='\n') xs

This throws an error because '\n' is a Char rather than a Word8, which is what Data.ByteString.splitWith is expecting.

How do I turn this simple character into a Word8 that ByteString will play with?

Oxidation answered 23/1, 2012 at 1:47 Comment(0)

You could just use the numeric literal 10, but if you want to convert the character literal you can use fromIntegral (ord '\n') (the fromIntegral is required to convert the Int that ord returns into a Word8). You'll have to import Data.Char for ord.

You could also import Data.ByteString.Char8, which offers functions for using Char instead of Word8 on the same ByteString data type. (Indeed, it has a lines function that does exactly what you want.) However, this is generally not recommended, as ByteStrings don't store Unicode codepoints (which is what Char represents) but instead raw octets (i.e. Word8s).

If you're processing textual data, you should consider using Text instead of ByteString.

Endearment answered 23/1, 2012 at 1:48 Comment(6)

Oh, wow. Excellent. I will have to dig into character representations, I guess. I have no idea what the numerical literals for the characters are. Is there a list of them somewhere? – Oxidation 23/1, 2012 at 1:52

I am writing a program that will parse protein database files, which contain strings, integers, and doubles. The strings will mostly be used to identify the right items out of a list, whereas the ints and doubles will be used in math operations. I am not sure what class I should use for this. – Oxidation 23/1, 2012 at 1:55

You could use ord in GHCi to find out the codepoint numbers of characters :) I generally get Unicode data from fileformat.info; the Basic Latin block contains the 128 codepoints inherited from ASCII. – Endearment 23/1, 2012 at 1:59

As for the appropriate type for your program, it depends on the specific format and what you're doing, but if they don't contain any binary data, then Text would work fine. However, if the strings are always pure ASCII, and you're processing a large amount of data, then ByteString is likely to be faster. – Endearment 23/1, 2012 at 2:1

Yes, the files are strictly ASCII, and performance is the goal. Thank you. – Oxidation 23/1, 2012 at 2:3

How do I create a Word8 now? – Tharp 13/2, 2016 at 20:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags