Downloading large files from the Internet in Haskell
Asked Answered
D

2

7

Are there any suggestions about how to download large files in Haskell? I figure Http.Conduit is is the library is a good library for this. However, how does it solve this? There is an example in its documentation but it is not fit for downloading large files, it just downloads a file:

 import Data.Conduit.Binary (sinkFile)
 import Network.HTTP.Conduit
 import qualified Data.Conduit as C

 main :: IO ()
 main = do
      request <- parseUrl "http://google.com/"
      withManager $ \manager -> do
          response <- http request manager
          responseBody response C.$$+- sinkFile "google.html"

What I want is be able to download large files and not run out of RAM, e.g. do it effectively in terms of performance, etc. Preferably, being able to continue downloading them "later", meaning "some part now, another part later".

I also found the download-curl package on hackage, but I'm not positive this is a good fit, or even that it downloads files chunk by chunk like I need.

Dobb answered 13/7, 2014 at 1:48 Comment(8)
Why do you think that example doesn't stream data?Cichocki
@Carl, if it does, why do you think so?Dobb
It uses conduit. Conduit is all about streaming data.Cichocki
@Carl, why does http-client library exist then? http-client is about streaming data. hackage.haskell.org/package/http-client-0.3.4/docs/…Dobb
@AlexanderSupertramp http-client is one of the dependencies for http-conduit.Losel
@Sibi, so http-conduit is built on http-client? but what makes you positive that http-conduit uses streaming data? what proves that? there is no such the proof in the documentation, is there?Dobb
all right, Data.Conduit is (its documentation prooves that), but there is no evidence about Network.HTTP.Conduit.Dobb
How would this be rewritten now that withManager is deprecated?Faceless
L
12

Network.HTTP.Conduit provides three functions for performing a request:

Out of the three functions, the first two functions will make the entire response body to live in memory. If you want to operate in constant memory, then use http function. The http function gives you access to a streaming interface through ResumableSource

The example you have provided in your code uses interleaved IO to write the response body to a file in constant memory space. So, you will not run out of memory when downloading a large file.

Losel answered 13/7, 2014 at 6:45 Comment(2)
but withManager is not http function like you said. Does it read a file chunk by chunk?Dobb
@AlexanderSupertramp withManager has got nothing to do with reading a file. It just keeps tracks of open connections.Losel
M
3

This works for me:

import           Control.Monad.Trans.Resource (runResourceT)
import           Data.Conduit.Combinators     (sinkFile)
import           Network.HTTP.Conduit         (parseRequest)
import           Network.HTTP.Simple          (httpSink)


downloadFile :: String -> IO ()
downloadFile url = do
  request <- parseRequest url
  runResourceT $ httpSink request $ \_ -> sinkFile "tmpfile"

I agree that it's a bit weird that it takes four different modules(and from 3 packages: conduit, resourcet and http-conduit) for such a task.

Mewl answered 27/3, 2021 at 23:49 Comment(1)
I think the last line can be runConduitRes $ httpSource request getResponseBody .| sinkFile filename, and the entire thing will only need two imports (Conduit and Network.HTTP.Simple).Housebound

© 2022 - 2024 — McMap. All rights reserved.