xml-tree parser (Haskell) for graph-library
Asked Answered
G

2

2

I'm writing a library for working with graphs. The primary task - parsing xml-tree. The tree looks like

<graph nodes=4 arcs=5>
    <node id=1 />
    <node id=2 />
    <node id=3 />
    <node id=4 />
    <arc from=1 to=2 />
    <arc from=1 to=3 />
    <arc from=1 to=4 />
    <arc from=2 to=4 />
    <arc from=3 to=4 />
</graph>

Structure for storing:

type Id = Int

data Node = Node Id deriving (Show)
data Arc = Arc Id Id deriving (Show)

data Graph = Graph { nodes :: [Node],
             arcs  :: [Arc]}

How to write data from the xml file into this structure? I can not write a parser for xml tree of this kind (HXT library)

Gabriellegabrielli answered 11/4, 2011 at 13:46 Comment(0)
P
1

Assuming that you convert that into proper XML (surround all the attribute values with quotes), the following code will work (using xml-enumerator):

{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Enumerator.Parse
import Control.Monad
import Data.Text (unpack)
import Control.Applicative

type Id = Int

data Node = Node Id deriving (Show)
data Arc = Arc Id Id deriving (Show)

data Graph = Graph { nodes :: [Node],
             arcs  :: [Arc]}
  deriving Show

main = parseFile_ "graph.xml" decodeEntities $ force "graph required" parseGraph

parseGraph = tagName "graph" getCounts $ \(nodeCount, arcCount) -> do
    nodes <- replicateM nodeCount parseNode
    arcs <- replicateM arcCount parseArc
    return $ Graph nodes arcs
  where
    requireNum name = do
        x <- requireAttr name
        case reads $ unpack x of
            (i, _):_ -> return i
            _ -> fail $ "Invalid integer: " ++ unpack x
    getCounts = do
        n <- requireNum "nodes"
        a <- requireNum "arcs"
        return (n, a)
    parseNode = force "node required" $ tagName "node"
        (Node <$> requireNum "id") return
    parseArc = force "arc required" $ tagName "arc"
        (Arc <$> requireNum "from" <*> requireNum "to") return

Outputs:

Graph {nodes = [Node 1,Node 2,Node 3,Node 4], arcs = [Arc 1 2,Arc 1 3,Arc 1 4,Arc 2 4,Arc 3 4]}
Pantile answered 11/4, 2011 at 16:46 Comment(3)
One recommendation I would give you is to remove the nodes and arcs attributes from graph; they are redundant and only serve to make the parse code more difficult (that's one of the reasons Don's code is neater). Also, this code requires all the nodes before the arcs; I'm not sure if that was your intent.Pantile
Ou, this error: Couldn't match expected type Data.XML.Types.Name' against inferred type [Char]' In the first argument of tagName', namely "graph"' In the first argument of ($)', namely tagName "graph" getCounts' In the expression: tagName "graph" getCounts $ \ (nodeCount, arcCount) -> do { nodes <- replicateM nodeCount parseNode; arcs <- replicateM arcCount parseArc; .... } Maybe I'm not connecting to any library...?Gabriellegabrielli
Looks like you didn't include the OverloadedStrings pragmaPantile
T
2

Do you need to use an XML library? The 'tagsoup' library might be just as effective for not-really-xml like this:

import Text.HTML.TagSoup
import Data.Maybe

main = do
    s <- readFile "A.dat"

    -- get a list of nodes and arcs
    let g' = catMaybes
                [ case n of
                    TagOpen "node" [(_,n)]        -> Just (Left  $ Node (read n)) 
                    TagOpen "arc"  [(_,n), (_,m)] -> Just (Right $ Arc (read n) (read m))
                    _ -> Nothing

                | n <- parseTags s ]

    -- collapse them into a graph
    let g = foldr (\n g -> case n of
                                Left  n -> g { nodes = n : nodes g }
                                Right a -> g { arcs  = a : arcs  g }
                        ) (Graph [] []) g'

    print g

Running this:

> main
Graph {nodes = [Node 1,Node 2,Node 3,Node 4], arcs = [Arc 1 2,Arc 1 3,Arc 1 4,Arc 2 4,Arc 3 4]}
Terence answered 11/4, 2011 at 16:18 Comment(0)
P
1

Assuming that you convert that into proper XML (surround all the attribute values with quotes), the following code will work (using xml-enumerator):

{-# LANGUAGE OverloadedStrings #-}
import Text.XML.Enumerator.Parse
import Control.Monad
import Data.Text (unpack)
import Control.Applicative

type Id = Int

data Node = Node Id deriving (Show)
data Arc = Arc Id Id deriving (Show)

data Graph = Graph { nodes :: [Node],
             arcs  :: [Arc]}
  deriving Show

main = parseFile_ "graph.xml" decodeEntities $ force "graph required" parseGraph

parseGraph = tagName "graph" getCounts $ \(nodeCount, arcCount) -> do
    nodes <- replicateM nodeCount parseNode
    arcs <- replicateM arcCount parseArc
    return $ Graph nodes arcs
  where
    requireNum name = do
        x <- requireAttr name
        case reads $ unpack x of
            (i, _):_ -> return i
            _ -> fail $ "Invalid integer: " ++ unpack x
    getCounts = do
        n <- requireNum "nodes"
        a <- requireNum "arcs"
        return (n, a)
    parseNode = force "node required" $ tagName "node"
        (Node <$> requireNum "id") return
    parseArc = force "arc required" $ tagName "arc"
        (Arc <$> requireNum "from" <*> requireNum "to") return

Outputs:

Graph {nodes = [Node 1,Node 2,Node 3,Node 4], arcs = [Arc 1 2,Arc 1 3,Arc 1 4,Arc 2 4,Arc 3 4]}
Pantile answered 11/4, 2011 at 16:46 Comment(3)
One recommendation I would give you is to remove the nodes and arcs attributes from graph; they are redundant and only serve to make the parse code more difficult (that's one of the reasons Don's code is neater). Also, this code requires all the nodes before the arcs; I'm not sure if that was your intent.Pantile
Ou, this error: Couldn't match expected type Data.XML.Types.Name' against inferred type [Char]' In the first argument of tagName', namely "graph"' In the first argument of ($)', namely tagName "graph" getCounts' In the expression: tagName "graph" getCounts $ \ (nodeCount, arcCount) -> do { nodes <- replicateM nodeCount parseNode; arcs <- replicateM arcCount parseArc; .... } Maybe I'm not connecting to any library...?Gabriellegabrielli
Looks like you didn't include the OverloadedStrings pragmaPantile

© 2022 - 2024 — McMap. All rights reserved.