Keeping IO lazy under append
Asked Answered
P

2

5

I may have been under the false impression that Haskell is lazier than it is, but I wonder if there's a way to get the best of both worlds...

Data.Monoid and Data.Semigroup define two variations of First. The monoidal version models the leftmost, non-empty value, whereas the semigroup version simply models the leftmost value.

This works fine for pure value values, but consider impure values:

x = putStrLn "x" >> return 42
y = putStrLn "y" >> return 1337

Both of these values have the type Num a => IO a. IO a is a Semigroup instance when a is:

instance Semigroup a => Semigroup (IO a)
  -- Defined in `Data.Orphans'

This means that it's possible to combine two IO (First a) values:

Prelude Data.Semigroup Data.Orphans> fmap First x <> fmap First y
x
y
First {getFirst = 42}

As we can see, though, both x and y produce their respective side-effects, even though y is never required.

The same applies for Data.Monoid:

Prelude Data.Monoid> fmap (First . Just) x <> fmap (First . Just) y
x
y
First {getFirst = Just 42}

I think I understand why this happens, given that both the Semigroup and Monoid instances use liftA2, which seems to ultimately be based on IO bind, which is strict, as far as I understand it.

If I dispense with the First abstraction(s), however, I can get lazier evaluation:

first x _ = x

mfirst x y = do
  x' <- x
  case x' of
    (Just _) -> return x'
    Nothing -> y

Using both of those ignores y:

Prelude> first x y
x
42
Prelude> mfirst (fmap Just x) (fmap Just y)
x
Just 42

In both of these cases, y isn't printed.

My question is, then:

Can I get the best of both worlds? Is there a way that I can retain the Semigroup or Monoid abstraction, while still get lazy IO?

Is there, for example, some sort of LazyIO container that I can wrap First values in, so that I get the lazy IO I'd like to have?

The actual scenario I'm after is that I'd like to query a prioritised list of IO resources for data, and use the first one that gives me a useful response. I don't, however, want to perform redundant queries (for performance reasons).

Pless answered 5/11, 2017 at 10:13 Comment(12)
Do you need the semigroup/monoid abstraction, or the First abstraction (aka early exit from IO functions)?Nikianikita
@Nikianikita Ideally, I'd prefer to keep the semigroup/monoid abstraction, but I'm open to suggestions.Pless
You could try to write your own instance for Semigroup a => Semigroup (IO a) that satisfies the behavior you're looking for.Oslo
@AsadSaeeduddin Would FirstIO (launchMissiles *> return Nothing) <> FirstIO (return (Just 'x')) be equal to FirstIO (return Nothing) <> FirstIO (return (Just 'x'))?Stanislaw
@Stanislaw Since the IO packed inside FirstIO (launchMissiles *> return Nothing) is not the same as the IO packed inside FirstIO (return Nothing), I would say no, they are not equal. The bit after the <> never factors into the evaluation.Oslo
@AsadSaeeduddin I think you are right, and it would be a valid Monoid. In fact, the Alternative instance for MaybeT in transformers is already very similar: hackage.haskell.org/package/transformers-0.5.5.0/docs/…Stanislaw
@MarkSeemann Thinking about this a bit more, what exactly is enforcing laziness in this monoid abstraction pushed through IO buying you? In other words, what use-cases would not be covered by the trivial instance Semigroup (IO (First a)) where (<>) ia ib = ia (with FlexibleInstances enabled)? See demo.Oslo
Relevant: parsonsmatt.org/2016/11/18/clean_alternatives_with_maybet.html and also hackage.haskell.org/package/base-4.10.0.0/docs/…Stanislaw
@AsadSaeeduddin A couple of quick tests with your suggestion seems to indicate that this actually solves my problem. If it's really that simple, I'd be happy. You're welcome to add that as an answer. Thank you.Pless
@AsadSaeeduddin I'm not sure I follow... With the Semigroup instance for IO in Data.Orphans, getFirst <$> (First <$> x) <> (First <$> y) executes both x and y...Pless
@MarkSeemann Sorry, made a mistake there. I meant to write getFirst $ (First x) <> (First y). This is basically the same thing as what that instance from the earlier comment is doing: just picking the first IO. Here's a demo.Oslo
@AsadSaeeduddin I see... In reality, my question is an attempt at simplifying my actual situation. What I really have is a series of functions that return IO (Maybe MyData), and I map those to IO (Option (First MyData)) in order to choose the leftmost, non-empty value.Pless
S
3

The Alternative instance for the MaybeT monad transformer returns the first successful result, and does not execute the rest of the operations. In combination with the asum function, we can write something like:

import Data.Foldable (asum)
import Control.Applicative
import Control.Monad.Trans.Maybe

action :: Char -> IO Char
action c = putChar c *> return c

main :: IO ()
main = do
    result <- runMaybeT $ asum $ [ empty
                                 , MaybeT $ action 'x' *> return Nothing
                                 , liftIO $ action 'v'
                                 , liftIO $ action 'z'
                                 ]
    print result

where the final action 'z' won't be executed.

We can also write a newtype wrapper with a Monoid instance which mimics the Alternative:

newtype FirstIO a = FirstIO (MaybeT IO a)

firstIO :: IO (Maybe a) -> FirstIO a
firstIO ioma = FirstIO (MaybeT ioma)

getFirstIO :: FirstIO a -> IO (Maybe a)
getFirstIO (FirstIO (MaybeT ioma)) = ioma

instance Monoid (FirstIO a) where
    mempty = FirstIO empty
    FirstIO m1 `mappend` FirstIO m2 = FirstIO $ m1 <|> m2

The relationship between Alternative and Monoid is explained in this other SO question.

Stanislaw answered 5/11, 2017 at 20:13 Comment(1)
Perfect, the newtype-based instance of Monoid (and Semigroup) was just what I was looking for! Thank you!Pless
N
1

Is there a way that I can retain the Semigroup or Monoid abstraction, while still get lazy IO?

Somewhat, but there are drawbacks. The udnerlying problem for our instances is that a generic instance for an Applicative will look like

instance Semigroup a => Semigroup (SomeApplicative a) where
    x <> y = (<>) <$> x <*> y

We're here at the mercy of (<*>), and usually the second argument y will be at least in WHNF. For example in Maybe's implementation the first line will work fine and the second line will error:

liftA2 (<>) Just (First 10) <> Just (error "never shown")
liftA2 (<>) Just (First 10) <> error "fire!"

IO's (<*>) is implemented in terms of ap, so the second action will always be executed before <> is applied.

A First-like variant is possible with ExceptT or similar, essentially any data type that has a Left k >>= _ = Left k like case so that we can stop the computation at that point. Although ExceptT is meant for exceptions, it may work well for your use-case. Alternatively, one of the Alternative transformers (MaybeT, ExceptT) together with <|> instead of <> might suffice.


A almost completely lazy IO type is also possible, but must be handled with care:

import Control.Applicative (liftA2)
import System.IO.Unsafe (unsafeInterleaveIO)  

newtype LazyIO a = LazyIO { runLazyIO :: IO a }

instance Functor LazyIO where
  fmap f = LazyIO . fmap f . runLazyIO

instance Applicative LazyIO where
  pure    = LazyIO . pure
  f <*> x = LazyIO $ do
              f' <- unsafeInterleaveIO (runLazyIO f)
              x' <- unsafeInterleaveIO (runLazyIO x)
              return $ f' x'

instance Monad LazyIO where
  return  = pure
  f >>= k = LazyIO $ runLazyIO f >>= runLazyIO . k

instance Semigroup a => Semigroup (LazyIO a) where
  (<>) = liftA2 (<>)

instance Monoid a => Monoid (LazyIO a) where
  mempty  = pure mempty
  mappend = liftA2 mappend

unsafeInterleaveIO will enable the behaviour you want (and is used in getContents and other lazy IO Prelude functions), but it must be used with care. The order of IO operations is completely off at that point. Only when we inspect the values we will trigger the original IO:

ghci> :module +Data.Monoid Control.Monad
ghci> let example = fmap (First . Just) . LazyIO . putStrLn $ "example"
ghci> runLazyIO $ fmap mconcat $ replicateM 100 example
First {getFirst = example
Just ()}

Note that we only got our example once in the output, but at a completely random place, since the putStrLn "example" and print result got interleaved, since

print (First x) = putStrLn (show (First x))
                = putStrLn ("First {getFirst = " ++ show x ++ "}")

and show x will finally put the IO necessary to get x in action. The action will get called only once if we use the result multiple times:

ghci> :module +Data.Monoid Control.Monad
ghci> let example = fmap (First . Just) . LazyIO . putStrLn $ "example"
ghci> result <- runLazyIO $ fmap mconcat $ replicateM 100 example
ghci> result
First {getFirst = example
Just ()}
ghci> result
First {getFirst = Just ()}

You could write a finalizeLazyIO function that either evaluates or seq's x though:

finalizeLazyIO :: LazyIO a -> IO a
finalizeLazyIO k = do
  x <- runLazyIO k
  x `seq` return x

If you were to publish a module with this functions, I'd recommend to only export the type constructor LazyIO, liftIO :: IO a -> LazyIO a and finalizeLazyIO.

Nikianikita answered 5/11, 2017 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.