Parse command-line arguments as ByteString
Asked Answered
K

0

6

Is there a cross-platform way to parse program arguments into a list of ByteString (instead of a list of String, as in System.Environment.getArgs)?

I am aware of System.Posix.Env.ByteString.getArgs from the unix package, but I would like to be able to run my program on Windows without cygwin. I am also aware of Data.ByteString.Char8.pack, but that truncates characters to 8 bits, and I would like to be able to process any Unicode character sequence of bytes.

EDIT: My program is a simple cipher which xor's the bits of the key against the bits of the message. For that reason, I would prefer to process the exact bits that were provided to the program instead of translating them into UTF-8 and back first.

Klaipeda answered 3/1, 2016 at 0:1 Comment(11)
fromString function in Data.ByteString.UTF8 "converts a Haskell string into a UTF8 encoded bytestring." - see hackage.haskell.org/package/utf8-string-0.3.8/docs/…Sela
@Sela That may work, but ideally I want to process the argument in the character-encoding scheme in which it was provided (e.g. UTF-16).Klaipeda
If you want to process Unicode, why don't you use Text? ByteString isn't really that Unicode friendly. Apart from that issue, you could transform the String to a CStringLen and create a ByteString from that. After all, you're already in IO.Inmesh
"process the argument in the character-encoding scheme in which it was provided". Why? Normally one wants to forget about the encoding ASAP, and work with characters rather than their codes. Anyway, Unix arguments may be sequences of bytes, but Windows arguments are not, they are sequences of characters.Beckman
I am not necessarily process Unicode; what's important for my purposes is processing the straight bits. I'll edit the question to clarify.Klaipeda
@n.m. So are Windows arguments guaranteed to be in a certain character-encoding scheme?Klaipeda
Why? Why encoding schemes? Who needs them? The users certainly don't type any encoding schemes as they run your program. They type characters. You get characters they type. Who needs to know how these characters travel from the keyboard to your program?Beckman
As for Windows, yes, it operates in UTF-16 internally, or maybe in UCS-2, who knows. They probably can't tell the difference themselves. Why anyone should care, I have no idea.Beckman
How about using a String with ord and chr?Finagle
@DavidYoung Unfortunately no; I need to compare bytes from one source to bytes from another source, and ord/chr would yield Unicode codepoints above 255, which I would need to translate into bytes, and there's no certainty that the method I might use to do so would match the character-encoding scheme in which it was provided (if it is indeed provided in a valid character-encoding scheme).Klaipeda
I think that you're going to have to copy the sources from System.Environment - if you look at these sources, getArgs converts to the system encoding at the last moment, it reads the data as "raw bytes". It seems that the code you want already exists, the piece you want is simply not exported as a function.Contrabandist

© 2022 - 2024 — McMap. All rights reserved.