What's so bad about Lazy I/O?
Asked Answered
C

6

93

I've generally heard that production code should avoid using Lazy I/O. My question is, why? Is it ever OK to use Lazy I/O outside of just toying around? And what makes the alternatives (e.g. enumerators) better?

Consubstantiate answered 5/5, 2011 at 4:22 Comment(0)
E
83

Lazy IO has the problem that releasing whatever resource you have acquired is somewhat unpredictable, as it depends on how your program consumes the data -- its "demand pattern". Once your program drops the last reference to the resource, the GC will eventually run and release that resource.

Lazy streams are a very convenient style to program in. This is why shell pipes are so fun and popular.

However, if resources are constrained (as in high-performance scenarios, or production environments that expect to scale to the limits of the machine) relying on the GC to clean up can be an insufficient guarantee.

Sometimes you have to release resources eagerly, in order to improve scalability.

So what are the alternatives to lazy IO that don't mean giving up on incremental processing (which in turn would consume too many resources)? Well, we have foldl based processing, aka iteratees or enumerators, introduced by Oleg Kiselyov in the late 2000s, and since popularized by a number of networking-based projects.

Instead of processing data as lazy streams, or in one huge batch, we instead abstract over chunk-based strict processing, with guaranteed finalization of the resource once the last chunk is read. That's the essence of iteratee-based programming, and one that offers very nice resource constraints.

The downside of iteratee-based IO is that it has a somewhat awkward programming model (roughly analogous to event-based programming, versus nice thread-based control). It is definitely an advanced technique, in any programming language. And for the vast majority of programming problems, lazy IO is entirely satisfactory. However, if you will be opening many files, or talking on many sockets, or otherwise using many simultaneous resources, an iteratee (or enumerator) approach might make sense.

Emilemile answered 5/5, 2011 at 4:33 Comment(2)
Since I just followed a link to this old question from a discussion of lazy I/O, I thought I'd add a note that since then, much of the awkwardness of iteratees has been superseeded by new streaming libraries like pipes and conduit.Nisse
In 2019, SWI-Prolog acquired its own implementation featuring the use of lazy lists, like those already in Haskell.Cowart
D
41

Dons has provided a very good answer, but he's left out what is (for me) one of the most compelling features of iteratees: they make it easier to reason about space management because old data must be explicitly retained. Consider:

average :: [Float] -> Float
average xs = sum xs / length xs

This is a well-known space leak, because the entire list xs must be retained in memory to calculate both sum and length. It's possible to make an efficient consumer by creating a fold:

average2 :: [Float] -> Float
average2 xs = uncurry (/) <$> foldl (\(sumT, n) x -> (sumT+x, n+1)) (0,0) xs
-- N.B. this will build up thunks as written, use a strict pair and foldl'

But it's somewhat inconvenient to have to do this for every stream processor. There are some generalizations (Conal Elliott - Beautiful Fold Zipping), but they don't seem to have caught on. However, iteratees can get you a similar level of expression.

aveIter = uncurry (/) <$> I.zip I.sum I.length

This isn't as efficient as a fold because the list is still iterated over multiple times, however it's collected in chunks so old data can be efficiently garbage collected. In order to break that property, it's necessary to explicitly retain the entire input, such as with stream2list:

badAveIter = (\xs -> sum xs / length xs) <$> I.stream2list

The state of iteratees as a programming model is a work in progress, however it's much better than even a year ago. We're learning what combinators are useful (e.g. zip, breakE, enumWith) and which are less so, with the result that built-in iteratees and combinators provide continually more expressivity.

That said, Dons is correct that they're an advanced technique; I certainly wouldn't use them for every I/O problem.

Dosimeter answered 5/5, 2011 at 14:16 Comment(0)
D
26

I use lazy I/O in production code all the time. It's only a problem in certain circumstances, like Don mentioned. But for just reading a few files it works fine.

Dubenko answered 5/5, 2011 at 8:34 Comment(1)
I use lazy I/O too. I turn to iteratees when I want more control over resource management.Dosimeter
T
21

Update: Recently on haskell-cafe Oleg Kiseljov showed that unsafeInterleaveST (which is used for implementing lazy IO within the ST monad) is very unsafe - it breaks equational reasoning. He shows that it allows to construct bad_ctx :: ((Bool,Bool) -> Bool) -> Bool such that

> bad_ctx (\(x,y) -> x == y)
True
> bad_ctx (\(x,y) -> y == x)
False

even though == is commutative.


Another problem with lazy IO: The actual IO operation can be deferred until it's too late, for example after the file is closed. Quoting from Haskell Wiki - Problems with lazy IO:

For example, a common beginner mistake is to close a file before one has finished reading it:

wrong = do
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr fileData

The problem is withFile closes the handle before fileData is forced. The correct way is to pass all the code to withFile:

right = withFile "test.txt" ReadMode $ \handle -> do
    fileData <- hGetContents handle
    putStr fileData

Here, the data is consumed before withFile finishes.

This is often unexpected and an easy-to-make error.


See also: Three examples of problems with Lazy I/O.

Thessalonian answered 29/10, 2012 at 20:37 Comment(2)
Actually combining hGetContents and withFile is pointless because the former puts the handle in a "pseudo-closed" state and will handle closing for you (lazily) so the code is exactly equivalent to readFile, or even openFile without hClose. That's basically what lazy I/O is. If you don't use readFile, getContents or hGetContents you're not using lazy I/O. For example line <- withFile "test.txt" ReadMode hGetLine works fine.Oberland
@Dag: although hGetContents will handle closing the file for you, it is also permissible to close it yourself "early", and helps to ensure resources are released predictably.Jugglery
J
18

Another problem with lazy IO that hasn't been mentioned so far is that it has surprising behaviour. In a normal Haskell program, it can sometimes be difficult to predict when each part of your program is evaluated, but fortunately due to purity it really doesn't matter unless you have performance problems. When lazy IO is introduced, the evaluation order of your code actually has an effect on its meaning, so changes that you're used to thinking of as harmless can cause you genuine problems.

As an example, here's a question about code that looks reasonable but is made more confusing by deferred IO: withFile vs. openFile

These problems aren't invariably fatal, but it's another thing to think about, and a sufficiently severe headache that I personally avoid lazy IO unless there's a real problem with doing all the work upfront.

Jugglery answered 1/3, 2012 at 14:22 Comment(0)
C
1

What's so bad about lazy I/O is that you, the programmer, have to micro-manage certain resources instead of the implementation. For example, which of the following is "different"?

  • freeSTRef :: STRef s a -> ST s ()
  • closeIORef :: IORef a -> IO ()
  • endMVar :: MVar a -> IO ()
  • discardTVar :: TVar -> STM ()
  • hClose :: Handle -> IO ()
  • finalizeForeignPtr :: ForeignPtr a -> IO ()

...out of all these dismissive definitions, the last two - hClose and finalizeForeignPtr - actually do exist. As for the rest, what service they could provide in the language is much more reliably performed by the implementation!

So if the dismissing of resources like file handles and foreign references was also left to the implementation, lazy I/O would probably be no worse than lazy evaluation.

Cowart answered 15/7, 2022 at 4:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.