What's Haskell's attitude towards Unicode in XML?
Asked Answered
K

1

7

I want to know what is the official solution to processing Unicode XML in Haskell is. I notice that HXT uses a simple String representation (a list of Unicode characters!!!) for text.

http://hackage.haskell.org/packages/archive/hxt/9.3.1.0/doc/html/Text-XML-HXT-DOM-TypeDefs.html#t:XNode

Constructors
XText String    ordinary text (leaf)
XBlob Blob          text represented more space efficient as bytestring (leaf)

How do you choose between the two representations when parsing? Forcing the user into using lists of characters doesn't sound like a particularly attractive feature, especially if the XML documents has a lot of text content.

Also, I found http://hackage.haskell.org/package/hxt-unicode on Google but am not sure how it is intended to be used with parsing. Support for Unicode used to be much more explicit as well: http://hackage.haskell.org/packages/archive/hxt/8.5.2/doc/html/Text-XML-HXT-DOM-Unicode.html but this module has been removed in the latest version (9.3.1.0 at the time of writing) without clear reason. What was the motivation behind that?

Could somebody give some example code, also, of how HXT is intended to be used please? The wiki pages are seriously lacking in this respect. Thank you.

Kirkham answered 5/10, 2012 at 16:32 Comment(1)
Which office would announce the "official solution to processing Unicode XML in Haskell"? About HXT, which I don't mean to recommend, there are example files all over the source: github.com/UweSchmidt/hxt .Dare
U
3

The xml-conduit package uses the Text datatype for storing textual data. It's become the standard textual data representation over the past few years. xml-conduit is a well maintained package, and I've personally used it for a huge amount of both open source and commercial code.

Unbalance answered 6/10, 2012 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.