Partial decoding of ByteStrings to Text
Asked Answered
K

1

8

I need to decode ByteStrings from various encodings into Text, but the ByteStrings might be incomplete fragments. Ideally, I would need a function with signature of something like:

decodeFragment :: Encoding -> ByteString -> (Text, ByteString)

which returnes the succesfully decoded Text as well as any remaining bytes that didn't form a complete unicode character (so I can re-use those bytes when I get the next fragment).

Does this sort of function already exist in some Haskell library, or do I need to roll my own? For now, I could even get started with something that doesn't support encodings beyond UTF-8.

Kibe answered 22/7, 2011 at 6:17 Comment(1)
Have you searched hackage.haskell.org/packages/archive/pkg-list.html?Persevering
R
2

Tricky. Usually, encoding is my go-to suggestion for encoding and decoding text, but I don't believe it offers the exact thing you're asking for. It comes close, in that it offers

decodeChar :: (Encoding enc, ByteSource m) => enc -> m Char

which you can iterate to get a m String. Catching the errors thrown by decodeChar will tell you if you've come to the end of a fragment. A cursory look at some of the other encoding packages on Hackage suggests that they will either require the same approach or will require a patch to expose the function analogous to the above one that they use internally.

Rawboned answered 22/7, 2011 at 7:56 Comment(2)
Getting an m String doesn't sound very good for getting Text.Distiller
It's about the best Hackage has to offer at the moment.Rawboned

© 2022 - 2024 — McMap. All rights reserved.