How to use the conduit drop function in a pipeline?
Asked Answered
T

1

9

I have a simple task - read a bunch of lines out of a file and do something with each one of them. Except the first one - which are some headings to be ignored.

So I thought I'd try out conduits.

printFile src = runResourceT $ CB.sourceFile src =$= 
    CT.decode CT.utf8 =$= CT.lines =$= CL.mapM_ putStrLn

Cool.

So now I just want to drop the first line off ... and there seems to be a function for that -

printFile src = runResourceT $ CB.sourceFile src =$= 
    CT.decode CT.utf8 =$= CT.lines =$= drop 1 =$= CL.mapM_ putStrLn

Hmm - but now I notice drop has type signature Sink a m (). Someone suggested to me that I can use the Monad instance for pipes and use drop to effectfully drop some elements - so I tried this:

drop' :: Int -> Pipe a a m ()
drop' n = do
  CL.drop n
  x <- await
  case x of 
    Just v -> yield v
    Nothing -> return ()

Which doesn't type check because the monad instance for pipes only applies to pipes of the same type - Sinks have Void as their output, so I can't use it like this.

I took a quick look at pipes and pipes-core and I notice that pipes-core has the function as I expected it to be, where as pipes is a minimal library but the documentation shows how it would be implemented.

So I'm confused - maybe there's a key concept I'm missing .. I saw the function

sequence ::  Sink input m output -> Conduit input m output

But that doesn't seem to be the right idea, as the output value is ()

CL.sequence (CL.drop 1) :: Conduit a m ()    

I'll probably just go back and use lazy-io as I don't really need any streaming - but I'd be interested to see the proper way to do it.

Trawl answered 31/5, 2012 at 13:37 Comment(0)
J
6

Firstly, the simple answer:

... =$= CT.lines =$= (CL.drop 1 >> CL.mapM_ putStrLn)

The longer explanation: there are really two different ways you can implement drop. Either way, it will first drop n elements from the input. There are two choices about what it does next:

  • Says it's done
  • Start outputting all of the remaining items from the input stream

The former behavior is what a Sink would perform (and what our drop actually does) while the latter is the behavior of a Conduit. You can in fact generate the latter from the former through monadic composition:

dropConduit n = CL.drop n >> CL.map id

Then you can use dropConduit as you describe at the beginning. This is a good way of demonstrating the difference between monadic composition and fusing; the former allows two functions to operate on the same input stream, while the latter allows one function to feed a stream to the other.

I haven't benchmarked, but I'm fairly certain that monadic composition will be a bit more efficient.

Jovita answered 31/5, 2012 at 15:21 Comment(2)
Hmm - the simple answer works well, thanks. dropConduit is Monad m => Int -> Pipe Void Void m () which I think makes it rather hard to use for anything I think?Trawl
Sorry, I'm working on a different version of the codebase where that wouldn't apply. In conduit 0.4, you'd need to have sinkToPipe (CL.drop n) >> CL.map id. The issue is that the types in Data.Conduit.List are overly restrictive. conduit 0.5 will be relaxing them.Jovita

© 2022 - 2024 — McMap. All rights reserved.