Collecting errors (instead of short-circuiting) until value is actually being used
Asked Answered
P

2

5

What's the simplest yet most elegant way to NOT short-circuit and instead collect errors until their values are used?

What's so hard in accumulating errors? Short circuit only if a function call receives an error value as value. But then return all errors accumulated since.

Insights:

  • Monad short-circuits on any error because >>= relies on there being an argument to apply the function to.
  • Applicative <*> can gather up errors from both of its arguments.

The code does not compile (see compiling error below) due to missing Monad instance as I don’t know how it needs to be designed. The code is intended to show the desired behaviour that a Monad instance or any other implementation should provide.

This is a request for a specific code solution (be it Monad instance or an entirely different approach) and NOT for a package recommendation.

Still the approaches used in following language extensions and packages might give some inspiration (parts emerged in #haskell IRC):

The following code is inspired by: How is it possible to collect all error messages in the Either Monad? https://blog.ploeh.dk/2018/11/05/applicative-validation/

{-# LANGUAGE DeriveFunctor, RecordWildCards, OverloadedStrings #-}

import Data.Text (Text)

newtype Errors e r = Errors (Either e r) deriving (Show, Eq, Functor)

instance Semigroup m => Applicative (Errors m) where
  pure = Errors . pure
  Errors (Left x) <*> Errors (Left y) = Errors (Left (x <> y))
  Errors f <*> Errors r = Errors (f <*> r)

data Result = Result {r1 :: !Int, rg :: !Int} deriving (Show)

f1 :: Errors [Text] Int
f1 = Errors $ Left ["f1 failed"]

f2 :: Errors [Text] Int
f2 = pure 2

f3 :: Errors [Text] Int
f3 = Errors $ Left ["f3 failed"]

f4 :: Errors [Text] Int
f4 = pure 4

f5 :: Errors [Text] Int
f5 = pure 5

g :: Int -> Int -> Errors [Text] Int
g a b | a + b <= 6 = Errors $ Left ["g: a + b NOT > 6"] -- we'll let `g` fail if sum is less than 6
      | otherwise = pure $ a * b

-- | in `scenario1` `g` uses one erroneous and one non-erroneous result.
--   since `g` tries to consume one erroneous result `r3` `g` can't execute.
--   it short-circuits the computation.
--   all up till then collected errors are returned.
--
-- >>> scenario1
-- Errors (Left ["f1 failed", "f3 failed"])
scenario1 :: Errors [Text] Result
scenario1 = do
  r1 <- f1 :: Errors [Text] Int -- fails, collected
  r2 <- f2 :: Errors [Text] Int -- succeeds
  r3 <- f3 :: Errors [Text] Int -- fails, collected
  -- we haven’t short-circuited until here, instead collected errors
  -- although `f1` failed, `f2` and `f3` still have been executed
  -- but now we need to short circuit on `f4` because at least any of `r2` or `r3` has error value
  rg <- g r2 r3 :: Errors [Text] Int
  pure $ Result {..}

-- | `g` can execute (all values are non-errors) but `g` itself produces an error.
--   computation short-circuits only on construction of `Result`.
--   that is because `Result` only carries non-error values but `g` produced error value.
--   `scenario2` returns error values of `f1` and `g`.
--
-- >>> scenario2
-- Errors (Left ["f1 failed", "g: a + b NOT > 6"])
scenario2:: Errors [Text] Result
scenario2 = do
  r1 <- f1 :: Errors [Text] Int -- fails, collected
  r2 <- f2 :: Errors [Text] Int -- succeeds
  r4 <- f4 :: Errors [Text] Int -- succeeds
  -- we haven’t short-circuited until here, instead collected errors
  -- although `f1` failed, `f2` and `f4` still have been executed
  -- `g` receives non-error values `r2` and `r4` with values 2 and 4
  -- now, g itself returns an error due to its logic
  rg <- g r2 r4 :: Errors [Text] Int
  -- we still don’t short-circuit `g`'s error being produced
  -- we only short-circuit on the error value tried being used by `Result`:
  pure $ Result {..}

-- | `g` does neither is fed with erroneous values nor
--   does `g` itself return an error. Instead construction of `Result` fails
--   since it tries to load value of `r1` which is erroneous but should be `Int`.
--
-- >>> scenario3
-- Errors (Left ["f1 failed"])
scenario3 :: Errors [Text] Result
scenario3 = do
  r1 <- f1 :: Errors [Text] Int -- fails, collected
  r2 <- f2 :: Errors [Text] Int -- succeeds
  r5 <- f5 :: Errors [Text] Int -- succeeds
  -- we haven’t short-circuited until here, instead collected errors
  -- although `f1` failed, `f2` and `f4` still have been executed
  -- `g` receives non-error values `r2` and `r5` with values 2 and 5
  -- now, `g` itself succeeds, no error
  rg <- g r2 r5 :: Errors [Text] Int
  -- `Result` is constructed, since `f1` failed, `r1` is of error value now
  -- hence `Result` cannot be constructed, failure "f1 failed" should be returned
  pure $ Result {..}

-- | no error anywhere, 
--
-- >>> scenario4
-- Errors (Right 7)
scenario4 :: Errors [Text] Result
scenario4 = do
  r1 <- f4 :: Errors [Text] Int -- succeeds
  r2 <- f2 :: Errors [Text] Int -- succeeds
  r5 <- f5 :: Errors [Text] Int -- succeeds
  -- now, `g` itself succeeds, no error
  rg <- g r2 r5 :: Errors [Text] Int
  -- `Result` is constructed successfully because it only takes in non-error values
  pure $ Result {..}

And here comes the error:

rc/MyLib2.hs:42:3: error: [GHC-39999]
    • No instance for ‘Monad (Errors [Text])’
        arising from a do statement
    • In a stmt of a 'do' block: r1 <- f1 :: Errors [Text] Int
      In the expression:
        do r1 <- f1 :: Errors [Text] Int
           r2 <- f2 :: Errors [Text] Int
           r3 <- f3 :: Errors [Text] Int
           rg <- g r2 r3 :: Errors [Text] Int
           ....
      In an equation for ‘scenario1’:
          scenario1
            = do r1 <- f1 :: Errors [Text] Int
                 r2 <- f2 :: Errors [Text] Int
                 r3 <- f3 :: Errors [Text] Int
                 ....
   |
42 |   r1 <- f1 :: Errors [Text] Int -- fails, collected
   |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Failed, no modules loaded.
Polytrophic answered 11/3 at 11:55 Comment(5)
notice in scenario1 the value r3 is an Int (from compiler's point of view). Therefore, when calling g r2 r3 we can't say: oh!, r3 is an error, hence let's not execute g, Are you confused by syntatic sugar of do notation maybe??. Between line r3 <- f3 and next line rg <- g r2 r3 there is a call to >>=. Unfortunatelly your code doesn't provide any Monad instance for ErrorsFuthark
Requests for library or tool suggestions aren't on-topic here. They're too hard to keep up-to-date, and often just a matter of opinion at any given time anyway.Einstein
@DanielWagner How is this is a request for a library or a tool?Syverson
There is no lawful Monad instance consistent with the Applicative (Validation e) instance (which you have replicated here in this code). So you are asking for an alternative to the validation package that makes other tradeoffs.Einstein
This looks like an Applicative plus an explicit join-like operation to handle calls to functions like g with the caveat that this "join" doesn't match the Applicative (cf., the bindValidation function in the validation package). As a result, I think it can probably be answered without appealing to a library or package. So, I'd vote for reopening.Teary
S
5

What I've been doing so far in situations like this isn't particularly sophisticated. It's possible that there's a more elegant way to do this with Haskell, but on the other hand, what follows here can be ported to other languages.

The way I address these sorts of problems is that I alternate between two representations of the same data; one which has validation-like behaviour, and one which is a proper monad.

For the monad it's obvious to use the built-in Either. For the validation representation, I'll here use Errors provided in the OP.

These two representations are isomorphic, but it's probably convenient to have explicit functions to go back and forth:

toEither :: Errors l r -> Either l r
toEither = coerce

fromEither :: Either l r -> Errors l r
fromEither = coerce

Furthermore, I'm also going to take advantage of this extension:

{-# LANGUAGE ApplicativeDo #-}

With these additions, we can implement each of the scenarios in the OP.

Scenario 1

Most of these scenarios have similar-looking code, so I'm going to mostly comment on the first one:

scenario1 :: Errors [Text] Result
scenario1 = do
  r1 <- f1
  rg <- fromEither $ do
        r2 <- toEither f2
        r3 <- toEither f3
        toEither $ g r2 r3
  pure $ Result r1 rg

The outer do expression uses Errors, which is why we need the ApplicativeDo extension. While Errors is not a Monad instance, that extension still enables the do syntax.

In order to compose r2 and r3 together with g, you'll need some kind of join functionality, so the inner do works in normal Either-monad mode by converting the values, and then converting the result back to Errors.

It may not be the most elegant solution, but it works:

ghci> scenario1
Errors (Left ["f1 failed","f3 failed"])

Scenario 2

This one, and the subsequent examples, follows the same template:

scenario2 :: Errors [Text] Result
scenario2 = do
  r1 <- f1
  rg <- fromEither $ do
        r2 <- toEither f2
        r4 <- toEither f4
        toEither $ g r2 r4
  pure $ Result r1 rg

Demo:

ghci> scenario2
Errors (Left ["f1 failed","g: a + b NOT > 6"])

Scenario 3

Code:

scenario3 :: Errors [Text] Result
scenario3 = do
  r1 <- f1
  rg <- fromEither $ do
        r2 <- toEither f2
        r5 <- toEither f5
        toEither $ g r2 r5
  pure $ Result r1 rg

Demo:

ghci> scenario3
Errors (Left ["f1 failed"])

Scenario 4

Code:

scenario4 :: Errors [Text] Result
scenario4 = do
  r1 <- f4
  rg <- fromEither $ do
        r2 <- toEither f2
        r5 <- toEither f5
        toEither $ g r2 r5
  pure $ Result r1 rg

Demo:

ghci> scenario4
Errors (Right (Result {r1 = 4, rg = 10}))
Syverson answered 11/3 at 20:3 Comment(0)
T
2

@MarkSeemann's approach works pretty well, but after fiddling around with a few designs, I think you can get the cleanest syntax by simply using a Writer monad with Maybe values:

type Errors e a = Writer e (Maybe a)

Note here that the Writer layer is not a transformer. This is a plain old writer monad whose values are lifted into Maybe. You write do-notation for the Writer, which takes care of collecting error messages, but the values you extract from the monad are always Maybe values that allow you to carry around multiple failed computations as Nothings where the cause of the failure has already been logged to the Writer.

So, in your do-blocks, a statement like:

r1 <- f1

is running an f1 :: Error [Text] Int computation to yield an r1 :: Maybe Int value. (Again, r1 is not an Int. All the extracted values are lifted into Maybe.) When you have a function like g that takes unlifted values but can generate its own errors:

g :: Int -> Int -> Errors [Text] Int

you use an adapter to explicitly mark the function as requiring its arguments to be present:

r1 <- f1
r2 <- f2
r3 <- f3
rg <- call2 g r2 r3  -- `call2` adapter ensures r2/r3 are present

You can also lift pure functions like (,) or the Result constructor from your example with additional adapters:

liftE2 Result r1 rg  -- `liftE2` lifts Result :: Int -> Int -> Result

The resulting scenarios look pretty clean. For example:

f1, f2, f3, f4, f5 :: Errors [String] Int
f1 = err "f1 failed"
f2 = yield 2
f3 = err "f3 failed"
f4 = yield 4
f5 = yield 5

g :: Int -> Int -> Errors [Text] Int
g a b | a + b <= 6 = err "g: a + b NOT > 6"
      | otherwise = yield $ a * b

scenario1 :: Errors [String] Result
scenario1 = do
  r1 <- f1
  r2 <- f2
  r3 <- f3
  rg <- call2 g r2 r3
  liftE2 Result r1 rg

and the adapters are straightforward, if tedious:

call2 :: (Monoid e) => (a1 -> a2 -> Errors e r) -> Maybe a1 -> Maybe a2 -> Errors e r
call2 f x1 x2    = makeCall (f <$> x1 <*> x2)
    where makeCall = fromMaybe (pure Nothing)

liftE2 :: (Monoid e) => (a1 -> a2 -> r) -> (Maybe a1 -> Maybe a2 -> Errors e r)
liftE2 f x1 x2    = pure (f <$> x1 <*> x2)

A complete example with more arities of adapters defined:

{-# LANGUAGE OverloadedStrings #-}

module Collect where

import Data.Text
import Data.Maybe (fromMaybe)
import Control.Monad.Writer

type Errors e a = Writer e (Maybe a)

runErrors :: Errors e a -> Either e a
runErrors act = case runWriter act of
  (Nothing, e) -> Left e
  (Just x, _)  -> Right x

err :: t -> Errors [t] a
err t = errs [t]

errs :: (Monoid e) => e -> Errors e a
errs e = tell e >> pure Nothing

yield :: (Monoid e) => a -> Errors e a
yield = pure . Just

call  :: (Monoid e) => (a1             -> Errors e r) -> Maybe a1                         -> Errors e r
call2 :: (Monoid e) => (a1 -> a2       -> Errors e r) -> Maybe a1 -> Maybe a2             -> Errors e r
call3 :: (Monoid e) => (a1 -> a2 -> a3 -> Errors e r) -> Maybe a1 -> Maybe a2 -> Maybe a3 -> Errors e r
call  f x1       = makeCall (f <$> x1)
call2 f x1 x2    = makeCall (f <$> x1 <*> x2)
call3 f x1 x2 x3 = makeCall (f <$> x1 <*> x2 <*> x3)

makeCall :: (Monoid e) => Maybe (Writer e (Maybe a)) -> Writer e (Maybe a)
makeCall = fromMaybe (pure Nothing)

liftE  :: (Monoid e) => (a1             -> r) -> (Maybe a1                         -> Errors e r)
liftE2 :: (Monoid e) => (a1 -> a2       -> r) -> (Maybe a1 -> Maybe a2             -> Errors e r)
liftE3 :: (Monoid e) => (a1 -> a2 -> a3 -> r) -> (Maybe a1 -> Maybe a2 -> Maybe a3 -> Errors e r)
liftE  f x1       = pure (f <$> x1)
liftE2 f x1 x2    = pure (f <$> x1 <*> x2)
liftE3 f x1 x2 x3 = pure (f <$> x1 <*> x2 <*> x3)

data Result = Result {r1 :: !Int, rg :: !Int} deriving (Show)

f1, f2, f3, f4, f5 :: Errors [Text] Int
f1 = err "f1 failed"
f2 = yield 2
f3 = err "f3 failed"
f4 = yield 4
f5 = yield 5

g :: Int -> Int -> Errors [Text] Int
g a b | a + b <= 6 = err "g: a + b NOT > 6"
      | otherwise = yield $ a * b

scenario1 :: Errors [Text] Result
scenario1 = do
  r1 <- f1
  r2 <- f2
  r3 <- f3
  rg <- call2 g r2 r3
  liftE2 Result r1 rg

scenario2 :: Errors [Text] Result
scenario2 = do
  r1 <- f1
  r2 <- f2
  r4 <- f4
  rg <- call2 g r2 r4
  liftE2 Result r1 rg

scenario3 :: Errors [Text] Result
scenario3 = do
  r1 <- f1
  r2 <- f2
  r5 <- f5
  rg <- call2 g r2 r5
  liftE2 Result r1 rg

scenario4 :: Errors [Text] Result
scenario4 = do
  r1 <- f4
  r2 <- f2
  r5 <- f5
  rg <- call2 g r2 r5
  liftE2 Result r1 rg

main = mapM_ (print . runErrors) [scenario1, scenario2, scenario3, scenario4]
Teary answered 12/3 at 23:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.