Is factoring an arrow out of arrow do notation a valid transformation?
Asked Answered
S

2

2

I'm trying to get my head around HXT, a Haskell library for parsing XML that uses arrows. For my specific use case I'd rather not use deep as there are cases where <outer_tag><payload_tag>value</payload_tag></outer_tag> is distinct from <outer_tag><inner_tag><payload_tag>value</payload_tag></inner_tag></outer_tag> but I ran into some weirdness that felt like it should work but doesn't.

I've managed to come up with a test case based on this example from the docs:

{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
module Main where

import Text.XML.HXT.Core

data Guest = Guest { firstName, lastName :: String }
  deriving (Show, Eq)


getGuest = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
    lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest' = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") <<< getChildren -< x
    lname <- getText <<< getChildren <<< (hasName "lname") <<< getChildren -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest'' = deep (isElem >>> hasName "guest") >>> getChildren >>>
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") -< x
    lname <- getText <<< getChildren <<< (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }


driver finalArrow = runX (readDocument [withValidate no] "guestbook.xml" >>> finalArrow)

main = do 
  guests <- driver getGuest
  print "getGuest"
  print guests

  guests' <- driver getGuest'
  print "getGuest'"
  print guests'

  guests'' <- driver getGuest''
  print "getGuest''"
  print guests''

Between getGuest and getGuest' I expand deep into the correct number of getChildren. The resulting function still works. I then factor the getChildren outside of the do block but this causes the resulting function to fail. The output is:

"getGuest"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest'"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest''"
[]

I feel like this should be a valid transformation to perform, but my understanding of arrows is a little shaky. Am I doing something wrong? Is this a bug that I should report?

I'm using HXT version 9.3.1.3 (the latest at the time of writing). ghc --version prints "The Glorious Glasgow Haskell Compilation System, version 7.4.1". I've also tested on a box with ghc 7.6.3 and got the same result.

The XML file had the following repetitive structure (the full file can be found here)

<guestbook>
  <guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
  </guest>
  <guest>
    <fname>Henry</fname>
    <lname>Ford</lname>
  </guest>
  <guest>
    <fname>Andrew</fname>
    <lname>Carnegie</lname>
  </guest>
</guestbook>
Seattle answered 24/2, 2014 at 18:24 Comment(2)
Could you post an example XML file to go with this?Apodosis
@Apodosis Okay, did that.Seattle
S
3

In getGuest'' you have

... (hasName "fname") -< x
... (hasName "lname") -< x

That is, you are restricting to the case where x is "fname" and x is "lname", which isn't satisfied by any x!

Spittoon answered 24/2, 2014 at 21:43 Comment(4)
So, not a valid factoring then? I'm going to go read the documentation on exactly how arrow do notation translates into regular Haskell.Seattle
Indeed certainly not valid in general, nor it turns out, in this particular case.Spittoon
If you want to think about the translation, the important point is the difference between f >>> (g1 &&& g2) and (f >>> g1) &&& (f >>> g2).Spittoon
I can now see that hose are different. If f had a side effect for example, the first "launches ze missiles" once, the second twice. My problem appears to be mentally translating proc/do notation into underlying symbols. It's easy to find the translation algorithm for monad do notation and internalise it, but the only source I've found so far is Ross Paterson's paper which is really hard to understand. Every other tutorial seems to give you an example and a translation without any explanation of how one became the other.Seattle
S
2

I've managed to work out the specific reason that the construction is interpreted the way it is. The following arrow translation found here provides a base to work from

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = proc x -> do
                y <- f -< x
                z <- g -< x
                returnA -< y + z

Becomes:

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> y + z)

From this we can, by analogy, derive:

getGuest''' = preproc >>>
           arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> Guest {firstName = z, lastName = y})

    where preproc = deep (isElem >>> hasName "guest") >>> getChildren
        f = getText <<< getChildren <<< (hasName "fname")
        g = getText <<< getChildren <<< (hasName "lname")

In HXT, the arrows can be imagined as streams of values flowing through filters. arr (\x->(x,x)) does not "split the stream", as I'd hoped. Instead it creates a stream of tuples that are filtered by f and survivors are filtered by g. As f and g are mutually exclusive, there are no survivors.

Examples with getChildren inside miraculously worked because the tuple stream contained values from further up the XML document looking something like

<guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
</guest>

and so were not mutually exclusive.

Seattle answered 25/2, 2014 at 0:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.