Scala Future for comprehension: sequential vs parallel
Asked Answered
I

2

7

Here we have SeqPar object which contains a task routine which is a simple mock Future which prints out some debugging info and returns Future[Int] type.

The question is: why experiment1 is allowed to run in parallel, while experiment2 always runs sequentially?

object SeqPar {
  def experiment1: Int = {
    val f1 = task(1)
    val f2 = task(2)
    val f3 = task(3)

    val computation = for {
      r1 <- f1
      r2 <- f2
      r3 <- f3
    } yield (r1 + r2 + r3)

    Await.result(computation, Duration.Inf)
  }

  def experiment2: Int = {
    val computation = for {
      r1 <- task(1)
      r2 <- task(2)
      r3 <- task(3)
    } yield (r1 + r2 + r3)

    Await.result(computation, Duration.Inf)
  }

  def task(i: Int): Future[Int] = {
    Future {
      println(s"task=$i thread=${Thread.currentThread().getId} time=${System.currentTimeMillis()}")
      i * i
    }
  }
}

When I run experiment1 it prints out:

task=3 thread=24 time=1541326607613
task=1 thread=22 time=1541326607613
task=2 thread=21 time=1541326607613

While experiment2:

task=1 thread=21 time=1541326610653
task=2 thread=20 time=1541326610653
task=3 thread=21 time=1541326610654

What is the reason for the observed difference? I do know that for comprehension desugared like f1.flatMap(r1 => f2.flatMap(r2 => f3.map(r3 => r1 + r2 + r3))) but I still missing a point why one is allowed to run in parallel and another isn't.

Inroad answered 4/11, 2018 at 10:46 Comment(1)
Does this answer your question? Scala's "for comprehension" with futuresUnpaidfor
G
15

This is an effect of what Future(…) and flatMap do:

  • val future = Future(task) starts running task in parallel
  • future.flatMap(result => task) arranges for running task when future completes

Note that future.flatMap(result => task) cannot start running task in parallel before future completes because to run task, we need result, which is only available when future completes.

Now lets look at your example1:

def experiment1: Int = {
  // construct three independent tasks and start running them
  val f1 = task(1)
  val f2 = task(2)
  val f3 = task(3)

  // construct one complicated task that is ...
  val computation =
    // ... waiting for f1 and then ...
    f1.flatMap(r1 =>
      // ... waiting for f2 and then ...
      f2.flatMap(r2 =>
        // ... waiting for f3 and then ...
        f3.map(r3 =>
          // ... adding some numbers.
          r1 + r2 + r3)))

  // now actually trigger all the waiting
  Await.result(computation, Duration.Inf)
}

So in example1, since all three tasks take the same time and were started at the same time, we probably only have to block when waiting for f1. When we get around to wait for the f2, its result should already been there.

Now how does example2 differ?

def experiment2: Int = {
  // construct one complicated task that is ...
  val computation =
    // ... starting task1 and then waiting for it and then ...
    task(1).flatMap(r1 =>
      // ... starting task2 and then waiting for it and then ...
      task(2).flatMap(r2 =>
        // ... starting task3 and then waiting for it and then ...
        task(3).map(r3 =>
          // ... adding some numbers.
          r1 + r2 + r3)))

  // now actually trigger all the waiting and the starting of tasks
  Await.result(computation, Duration.Inf)
}

In this example, we are not even constructing task(2) before we have waited for task(1) to finish, so the tasks cannot run in parallel.

So when programming with Scala's Future, you have to control your concurrency by choosing correctly between code like example1 and code like example2. Or you can look into libraries that provide more explicit control over concurrency.

Gyneco answered 4/11, 2018 at 12:30 Comment(1)
Thank you @Gyneco , such a quality answer, everything is clear now!Inroad
M
1

This is because Scala Futures are strict. The operation inside a Future is executed as soon as the Future is created and then it memoizes its value. So you are losing referential transparency. In your case your futures are executed in your first task call, the result is memoized. They are not executed again inside the for. In the second case futures are created in your for comprehension and the result is correct.

Monophony answered 4/11, 2018 at 11:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.