Converting normal attoparsec parser code to conduit/pipe based
Asked Answered
N

2

5

I have written a following parsing code using attoparsec:

data Test = Test {
  a :: Int,
  b :: Int
  } deriving (Show)

testParser :: Parser Test
testParser = do
  a <- decimal
  tab
  b <- decimal
  return $ Test a b

tParser :: Parser [Test]
tParser =  many' $ testParser <* endOfLine

This works fine for small sized files, I execute it like this:

main :: IO ()
main = do
  text <- TL.readFile "./testFile"
  let (Right a) = parseOnly (manyTill anyChar endOfLine *> tParser) text
  print a  

But when the size of the file is greater than 70MB, it consumes tons of memory. As a solution, I thought I would use attoparsec-conduit. After going through their API, I'm not sure how to make them work together. My parser has the type Parser Test but it's sinkParser actually accepts parser of type Parser a b. I'm interested in how to execute this parser in constant memory ? (A pipes based solution is also acceptable, but I'm not used to the Pipes API.)

Norword answered 5/6, 2014 at 11:21 Comment(0)
M
5

The first type parameter to Parser is just the data type of the input (either Text or ByteString). You can provide your testParser function as the argument to sinkParser and it will work fine. Here's a short example:

{-# LANGUAGE OverloadedStrings #-}
import           Conduit                 (liftIO, mapM_C, runResourceT,
                                          sourceFile, ($$), (=$))
import           Data.Attoparsec.Text    (Parser, decimal, endOfLine, space)
import           Data.Conduit.Attoparsec (conduitParser)

data Test = Test {
  a :: Int,
  b :: Int
  } deriving (Show)

testParser :: Parser Test
testParser = do
  a <- decimal
  space
  b <- decimal
  endOfLine
  return $ Test a b

main :: IO ()
main = runResourceT
     $ sourceFile "foo.txt"
    $$ conduitParser testParser
    =$ mapM_C (liftIO . print)
Mongolism answered 5/6, 2014 at 16:1 Comment(0)
G
5

Here is the pipes solution (assuming that you are using a Text-based parser):

import Pipes
import Pipes.Text.IO (fromHandle)
import Pipes.Attoparsec (parsed)
import qualified System.IO as IO

main = IO.withFile "./testfile" IO.ReadMode $ \handle -> runEffect $
    for (parsed (testParser <* endOfLine) (fromHandle handle)) (lift . print)
Gunman answered 5/6, 2014 at 16:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.