HXT: Left-Factoring Nondeterministic Arrows?
Asked Answered
C

1

7

I'm trying to come to terms with Haskell's XML Toolbox (HXT) and I'm hitting a wall somewhere, because I don't seem to fully grasp arrows as a computational tool.

Here's my problem, which I hoped to illustrate a little better using a GHCi session:

> let parse p = runLA (xread >>> p) "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> :t parse
parse :: LA XmlTree b -> [b]

So Parse is a small helper function that applies whatever arrow I give it to the trivial XML document

<root>
  <a>foo</a>
  <b>bar</b>
  <c>baz</c>
</root>

I define another helper function, this time to extract the text below a node with a given name:

> let extract s = getChildren >>> isElem >>> hasName s >>> getChildren >>> getText 
> :t extract
extract :: (ArrowXml cat) =>
   String -> cat (Data.Tree.NTree.TypeDefs.NTree XNode) String
> parse (extract "a" &&& extract "b") -- extract two nodes' content.
[("foo","bar")]

With the help of this function, it's easy to use the &&& combinator to pair up the text of two different nodes, and then, say, pass it to a constructor, like this:

> parse (extract "a" &&& extract "b" >>^ arr (\(a,b) -> (b,a))) 
[("bar","foo")]

Now comes the part I don't understand: I want to left-factor! extract calls getChildren on the root-node twice. Instead, I'd like it to only call it once! So I first get the child of the root node

> let extract' s = hasName s >>> getChildren >>> getText
> :t extract'
extract' :: (ArrowXml cat) => String -> cat XmlTree String
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[]

Note, that I've tried to re-order the calls to, say, isElem, etc. in order to find out if that's the issue. But as it stands, I just don't have any idea why this isn't working. There is an arrow 'tutorial' on the Haskell wiki and the way I understood it, it should be possible to do what I want to do that way — namely use &&& in order to pair up the results of two computations.

It does work, too — but only at the start of the arrow-chain, not mid-way trough, when I have some results already, that I want to keep 'shared.' I have the feeling that I'm just not being able to wrap my head around a difference in ideas between normal function composition and arrow notation. I'd be very appreciative of any pointers! (Even if it is just to some generic arrow-tutorial that goes a little more in-depth than the on the Haskell-wiki.)

Thank you!

Co answered 18/11, 2010 at 22:34 Comment(6)
I think you mean xread instead of readx in the definition of parse?Crumley
Yes, thank you. I always make that mistake, dunno why. I'm gonna edit it.Co
I don't have a definitive answer (and am myself somewhat interested to see whatever it turns out to be), but I suspect the extract' "a" &&& extract' "b" is constraining both branches - i.e., the set of acceptable parses is the set for which both branches has a result, which is the empty set.Gorse
@mokus: Yes, that's what's happening here (try arr show &&& extract' "b", for example). What I'd be curious to see is an explanation of why that's what (&&&) does for list arrows, when the behavior described by @Aleksandar Dimitrov seems more natural.Crumley
See my update below. This seems to be specifically a problem (maybe a bug?) with runLA and xread and doesn't happen when using runX.Crumley
I would actually call this a bug. I'll see if I can contact the maintainer and ask him is thinking. Thanks for clearing it up, Travis!Co
C
2

If you convert the arrow to (and then from) a deterministic version this works as expected:

> let extract' s = unlistA >>> hasName s >>> getChildren >>> getText
> parse (listA (getChildren >>> isElem) >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This isn't really satisfactory, though, and I can't remember off the top of my head why (&&&) behaves this way with a nondeterministic arrow (I'd personally use the proc/do notation for anything much more complicated than this).


UPDATE: There seems to be something weird going on here with runLA and xread. If you use runX and readString everything works as expected:

> let xml = "<root><a>foo</a><b>bar</b><c>baz</c></root>"
> let parse p = runX (readString [] xml >>> p)
> let extract' s = getChildren >>> hasName s >>> getChildren >>> getText
> parse (getChildren >>> isElem >>> (extract' "a" &&& extract' "b"))
[("foo","bar")]

This means you have to run the parser in the IO monad, but there are advantages to using runX anyway (better error messages, etc.).

Crumley answered 18/11, 2010 at 23:32 Comment(1)
WHAT? That is interesting. Thank you for reply! I'm a little disconcerted, but I can definitely live with it. Come to think of it, I do think this might be a bug. Ideally, the different run* functions should really just affect the Arrow something runs in (and thus its capabilities,) and not the way the arrow combinators work within the computation!Co

© 2022 - 2024 — McMap. All rights reserved.