Cats effect - parallel composition of independent effects
Asked Answered
S

1

13

I want to combine multiple IO values that should run independently in parallel.

val io1: IO[Int] = ???
val io2: IO[Int] = ???

As I see it, I have to options:

  1. Use cats-effect's fibers with a fork-join pattern
    val parallelSum1: IO[Int] = for {
      fiber1 <- io1.start
      fiber2 <- io2.start
      i1 <- fiber1.join
      i2 <- fiber2.join
    } yield i1 + i2
    
  2. Use the Parallel instance for IO with parMapN (or one of its siblings like parTraverse, parSequence, parTupled etc)
    val parallelSum2: IO[Int] = (io1, io2).parMapN(_ + _)
    

Not sure about the pros and cons of each approach, and when should I choose one over the other. This becomes even more tricky when abstracting over the effect type IO (tagless-final style):

def io1[F[_]]: F[Int] = ???
def io2[F[_]]: F[Int] = ???

def parallelSum1[F[_]: Concurrent]: F[Int] = for {
  fiber1 <- io1[F].start
  fiber2 <- io2[F].start
  i1 <- fiber1.join
  i2 <- fiber2.join
} yield i1 + i2

def parallelSum2[F[_], G[_]](implicit parallel: Parallel[F, G]): F[Int] =
  (io1[F], io2[F]).parMapN(_ + _)

The Parallel typeclass requires 2 type constructors, making it somewhat more cumbersome to use, without context bounds and with an additional vague type parameter G[_]

Your guidance is appreciated :)

Amitay

Suzisuzie answered 13/1, 2019 at 13:35 Comment(1)
Regarding Parallel requiring two type parameters, you may be interested in the upcoming Parallel1Riker
D
7

I want to combine multiple IO values that should run independently in parallel.

The way I view it, in order to figure out "when do I use which?", we need to return the the old parallel vs concurrent discussion, which basically boils down to (quoting the accepted answer):

Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant. For example, multitasking on a single-core machine.

Parallelism is when tasks literally run at the same time, e.g., on a multicore processor.

We often like to provide an example of concurrency when we we do IO like operations, such as creating an over the wire call, or talking to disk.

Question is, which one do you want when you say you want to execute "in parallel", is it the former or the latter?

If we're referring to the former, then using Concurrent[F] both conveys the intention by the signature and provides the proper execution semantics. If it's the latter, and we, for example, want to process a collection of elements in parallel, then going with Parallel[F, G] would be the better solution.

It is often quite confusing when we think about the semantics of this regarding IO, because it has both instances for Parallel and Concurrent and we mostly use it to opaquely define side effecting operations.

As a side note, the reason behind Parallel taking two unary type constructors is because of the fact that M (in Parallel[M[_], F[_]]) in always a Monad instance, and we need a way to prove the Monad has an Applicative[F] instance as well for parallel executions, because when we think of a Monad we always talk about sequential execution semantics.

Davisdavison answered 14/1, 2019 at 12:19 Comment(3)
Hi Yuval, thanks for the answer. I know about the difference between concurrency and parallelism, but still not sure about the pros of cons of each approach, which seemingly achieve similar semantics. Regarding the note on Parallel[M[_], F[_]] - I still don't fully understand the role of that F... I understand there's a plan to create a single type parameter Parallel typeclass in version 2, similar to the temp solution offered here: github.com/ChristopherDavenport/cats-parSuzisuzie
@Suzisuzie TBH, both Concurrent and Parallel instances for both Monix Task and cats IO implementations are identical under the covers. I don't think you'll find a deep difference other than the semantics. I think the docs provides a good explanation. It seems that Parallel is the ability to pair like structured data types where one is a Monad and the other is an Applicative (i.e. Either -> Validated). The docs don't talk about parallelism at all.Davisdavison
@Suzisuzie Also: github.com/typelevel/cats/issues/1830 and github.com/typelevel/cats/pull/1837. It seems that Parallel is used for both running parallel computation and also being able to go back and forth between monads and a respective applicative instances for them because these applicatives are able to execute in parallelDavisdavison

© 2022 - 2024 — McMap. All rights reserved.