IO over big files in haskell: Performance issue

Asked 24/3, 2011 at 13:8 Answered 27/3, 2011 at 13:44

Solved io lazy-evaluation bytestring haskell

I'm trying to work over big files using Haskell. I'd like to browse an input file byte after byte, and to generate an output byte after byte. Of course I need the IO to be buffered with blocks of reasonable size (a few KB). I can't do it, and I need your help please.

import System 
import qualified Data.ByteString.Lazy as BL 
import Data.Word  
import Data.List

main :: IO () 
main =     
    do         
        args <- System.getArgs         
        let filename = head args         
        byteString <- BL.readFile filename         
        let wordsList = BL.unpack byteString         
        let foldFun acc word = doSomeStuff word : acc
        let wordsListCopy = foldl' foldFun [] wordsList
        let byteStringCopy = BL.pack (reverse wordsListCopy)
        BL.writeFile (filename ++ ".cpy") byteStringCopy
    where
        doSomeStuff = id

I name this file TestCopy.hs, then do the following:

$ ls -l *MB
-rwxrwxrwx 1 root root 10000000 2011-03-24 13:11 10MB
-rwxrwxrwx 1 root root  5000000 2011-03-24 13:31 5MB
$ ghc --make -O TestCopy.hs 
[1 of 1] Compiling Main             ( TestCopy.hs, TestCopy.o )
Linking TestCopy ...
$ time ./TestCopy 5MB

real    0m5.631s
user    0m1.972s
sys 0m2.488s
$ diff 5MB 5MB.cpy
$ time ./TestCopy 10MB 

real    3m6.671s
user    0m3.404s
sys 1m21.649s
$ diff 10MB 10MB.cpy 
$ time ./TestCopy 10MB +RTS -K500M -RTS

real    2m50.261s
user    0m3.808s
sys 1m13.849s
$ diff 10MB 10MB.cpy 
$

My problem: There is a huge difference between a 5MB and a 10 MB file. I'd like the performances to be linear in the size of the input file. Please what am i doing wrong, and how can I achieve this? I don't mind using lazy bytestrings or anything else as long as it works, but it has to be a standard ghc library.

Precision: It's for a university project. And I'm not trying to copy files. The doSomeStuff function shall perform compression/decompression actions that I have to customize.

Wesla answered 24/3, 2011 at 13:8 Comment(12)

ByteString's pack and unpack are very costly operations. Can't you doSomeStuff directly with ByteString? NB: lazy ByteString is 'buffered' internally which might be adequate for your task – Leap 24/3, 2011 at 13:12

I just tried it without pack and unpack, working directly on bytestrings, and the result is even longer, and still i have this big difference between 5MB and 10MB. – Wesla 24/3, 2011 at 13:17

Maybe you could post somewhere a complete working example of your code demonstrating the problem? – Leap 24/3, 2011 at 13:23

@Ed'ka But this is what i did. The working code of TestCopy.hs above is demonstrating the problem, since the 10MB file is copied in 3 minutes. – Wesla 24/3, 2011 at 13:28

@Ed'ka Do you mean without pack and unpack ? here it is: import System import qualified Data.ByteString.Lazy as BL import Data.Word import Data.List main :: IO () main = do args <- System.getArgs let filename = head args byteString <- BL.readFile filename let foldFun acc word = (doSomeStuff word) BL.cons acc let byteStringCopy = BL.foldl' foldFun BL.empty byteString BL.writeFile (filename ++ ".cpy") (byteStringCopy) where doSomeStuff = id – Wesla 24/3, 2011 at 13:29

@Joel, my apologies - I've missed that fact that your code was perfectly 'buildable'. Well, now the reason for it being so slow is that ByteString.cons is very costly too :) (it does memcpy internally and here it's executed repeatedly for every byte in your file). So you should avoid cons as well. Here you just copy the entire file - try writing ByteString that will show you the actual (maximum) speed of lazy ByteString reading/writing.What actual transformation do you intend to do with the content of the file? – Leap 24/3, 2011 at 13:55

In fact, it's not so slow since for a 5MB file the speed is ok. It's when i reach a 10MB limit that the IO becomes extremely slow. The transformations i need to perform are bitwise operations specific to the project i'm doing. I can't use a library for them since it's a specific university project. I have to perform bitwise operations on the bytes myself. – Wesla 24/3, 2011 at 13:59

5s for 5MB is not ok. With no work at all it should process it in a spit of a second even with 10Mb. I bet you can employ ByteString.mapAccumL to code your bitwise operations btw. – Leap 24/3, 2011 at 14:16

I know it's slow for 5MB, and i would like another solution with acceptable performances. I don't know any other. And for the mapAccumL function, i can't use it since i don't map bytes to bytes. I'm compressing and decompressing, and 1 byte in the input file is mapped to a certain number of bits in the output. All of the compression stuff is working fine already, and extremely fast. My problem is the problem i wrote explicitely in the question : To browse bytes from an input file and write bytes in an output file. Thanks. – Wesla 24/3, 2011 at 14:21

You shouldn't be building bytestring output directly. Use blaze-builder instead. It should be much more efficient. – Fablan 24/3, 2011 at 14:29

But i can't use any library which is not standard, because the program will be running on the university's system. – Wesla 24/3, 2011 at 14:43

cabal unpack blaze-builder -- now you have the source and can move any files you need directly into your repo. – Fablan 24/3, 2011 at 16:33

I take it this is a follow on to Reading large file in haskell? from yesterday.

Try compiling with "-rtsopts -O2" instead of just "-O".

You claim "I'd like to browse an input file byte after byte, and to generate an output byte after byte." but your code reads the entire input before trying to create any output. This is just not very representative of the goal.

With my system I see "ghc -rtsopts --make -O2 b.hs" giving

(! 741)-> time ./b five
real 0m2.789s user 0m2.235s sys 0m0.518s
(! 742)-> time ./b ten
real 0m5.830s user 0m4.623s sys 0m1.027s

Which now looks linear to me.

Beria answered 24/3, 2011 at 15:9 Comment(4)

Yes but the ByteString.Lazy package is supposed to be lazy, at least I think. Anyway i don't mind using another technique or library as long as it's standard. The options you proposed me are not recognized by my compiler. My personal version is 6.12.1, and the one on the university's server, which i must use, is 6.8.2. Do you think i would get a normal result if I uninstalled ghc and reinstalled the latest version ? PS: My question yesterday was about the same project, but another issue, that you solved, thanks. – Wesla 24/3, 2011 at 15:27

@Wesla - the -rtsopts flag is only useful with ghc-7, so leave that off. -O2 should be standard though. – Irradiant 24/3, 2011 at 16:8

Once again, you were right. I followed your explaination and re-wrote my entire IO library with internal buffers. Indeed, i wasn't buffering anything at all. I will probably never understand what -Lazy IO- means... But i will succeed in my exam because my haskell compression utility is only 5 times slower than a C program doing the same thing. You have been extremely helpful Chris Kuklewicz, thank you very much for sharing your great knowledge. – Wesla 24/3, 2011 at 20:38

Hm, according to GHC documentation: "At the moment, -O2 is unlikely to produce better code than -O" (c) haskell.org/ghc/docs/7.0-latest/html/users_guide/… – Leap 24/3, 2011 at 21:57

For chunked input processing I would use the enumerator package.

import Data.Enumerator
import Data.Enumerator.Binary (enumFile)

We use bytestrings

import Data.ByteString as BS

and IO

import Control.Monad.Trans (liftIO)
import Control.Monad (mapM_)
import System (getArgs)

Your main function could look like following:

main =
  do (filepath:_) <- getArgs
     let destination
     run_ $ enumFile filepath $$ writeFile (filepath ++ ".cpy")

enumFile reads 4096 bytes per chunk and passes these to writeFile, which writes it down.

enumWrite is defined as:

enumWrite :: FilePath -> Iteratee BS.ByteString IO ()
enumWrite filepath =
  do liftIO (BS.writeFile filepath BS.empty)   -- ensure the destination is empty
     continue step
  where
  step (Chunks xs) =
    do liftIO (mapM_ (BS.appendFile filepath) xs)
       continue step
  step EOF         = yield () EOF

As you can see, the step function takes chunks of bytestrings and appends them to the destination file. These chunks have the type Stream BS.Bytestring, where Stream is defined as:

data Stream a = Chunks [a] | EOF

On an EOF step terminates, yielding ().

To have a much more elaborate read on this I personally recommend Michael Snoymans tutorial

The numbers

$ time ./TestCopy 5MB
./TestCopy 5MB  2,91s user 0,32s system 96% cpu 3,356 total

$ time ./TestCopy2 5MB
./TestCopy2 5MB  0,04s user 0,03s system 93% cpu 0,075 total

That's quite an improvement. Now in order to implement your fold you probably want to write an Enumeratee, which is used to transform a input stream. Fortunately there is already a map function defined in the enumerator package, which can be modified for your need, i.e. it can be modified to carry over state.

On the construction of the intermediate result

You construct wordsList in reverse order and reverse it afterwards. I think difference lists do a better job, because appends take only O(1) time due to the fact that appending is only a function composition. I'm not sure whether they takes more space though. Here's a rough sketch of difference lists:

type DList a = [a] -> [a]

emptyList :: DList a
emptyList = id

snoc :: DList a -> a -> DList a
snoc dlist a = dlist . (a:)

toList :: DList a -> [a]
toList dlist = dlist []

This answer is probably not needed anymore, but I added it for completeness.

Candidacandidacy answered 27/3, 2011 at 13:44 Comment(1)

writeFile and enumWrite are two names for the same function, right? – Bronez 18/9, 2015 at 12:12