Haskell streaming download
Asked Answered
P

1

5

The two resources I found that suggested recipes for streaming downloads using popular Haskell libraries were:

How would I modify the code in the former to (a) save to file, and (b) print only a (take 5) of the byte response, rather than the whole response to stdout?

My attempt at (b) is:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-5.13 runghc
   --package http-conduit
 -}
{-# LANGUAGE OverloadedStrings #-}
import           Control.Monad.IO.Class (liftIO)
import qualified Data.ByteString        as S
import qualified Data.Conduit.List      as CL
import           Network.HTTP.Simple
import           System.IO              (stdout)

main :: IO ()
main = httpSink "http://httpbin.org/get" $ \response -> do
    liftIO $ putStrLn
           $ "The status code was: "
          ++ show (getResponseStatusCode response)

    CL.mapM_ (take 5) (S.hPut stdout)

Which fails to map the (take 5), and suggests to me among other things I still don't understand how mapping over monads works, or liftIO.

Also, this resource:

http://haskelliseasy.readthedocs.io/en/latest/#note-on-streaming

...gave me a warning, "I know what I'm doing and I'd like more fine-grained control over resources, such as streaming" that this not easily or generally supported.

Other places I looked:

If there's anything in the Haskellverse that makes this easier, more like Python's requests:

response = requests.get(URL, stream=True)
for i,chunk in enumerate(response.iter_content(BLOCK)):
  f.write(chunk)

I'd appreciate the tip there, too, or pointers towards the 2016 state of the art.

Platinotype answered 28/11, 2016 at 3:33 Comment(3)
Do you actually need to stream this? (So are you getting a sufficiently large amount of data at once that having it all in memory is unacceptable?)Interscholastic
Yes please, I am.Platinotype
I probably should have chosen a more sensible number of bytes, like some 2^n...Platinotype
I
5

You are probably looking for httpSource from the latest version of http-conduit. It behaves pretty much exactly like Python's requests: you get back a stream of chunks.

save to file

This is easy, just redirect the source straight into a file sink.

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| sinkFile "data_file"

print only a (take 5) of the byte response

Once we have the source, we take the first 5 bytes with takeCE 5 and then print these via printC.

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Data.ByteString (unpack)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| takeCE 5
                    .| printC

save to file and print only a (take 5) of the byte response

To do this, you want zipSinks or, for more general cases that involve zipping multiple sinks ZipSink:

#!/usr/bin/env stack
{- stack --install-ghc --resolver nightly-2016-11-26 runghc --package http-conduit -}

{-# LANGUAGE OverloadedStrings #-}
import Network.HTTP.Simple (httpSource, getResponseBody)
import Data.ByteString (unpack)
import Data.Conduit.Internal (zipSinks)
import Conduit

main = runConduitRes $ httpSource "http://httpbin.org/get" getResponseBody
                    .| zipSinks (takeCE 5 .| printC)
                                (sinkFile "data_file")
Interscholastic answered 28/11, 2016 at 5:11 Comment(6)
Actually, takeCE will take exactly give elements of the stream, aka five bytes.Woodson
Sorry, still learning conduit and this whole way of thinking: What modification to kind of tee this to /both/ stream the takeC5 to stdout and also sink to file?Platinotype
@Platinotype Updated the answer. ZipSource, ZipSink, and ZipConduit are your friends for forking/joining.Interscholastic
Thanks, @Alec. I was a little confused in how to get the resolver set up, but I got this running and the forking of multiple sinks makes a lot of sense to me, more than the liftIO and mapM_ in my original example. This is very helpful!Platinotype
The type signature on main here is: main :: IO ((), ()) ?Platinotype
@Platinotype Yeah - I left it as that because it looked nicer (and technically main can have IO a). If you want to force main :: IO (), just add () <$ right after the main = and add parens around the whole runConduitRes ... "data_file") body.Interscholastic

© 2022 - 2024 — McMap. All rights reserved.