How to prevent actor starvation in the presence of other long-running actors?

Asked 8/11, 2010 at 20:22 Answered 8/11, 2010 at 22:47

Solved scala concurrency scala-2.8 actor

This is using Scala 2.8 Actors. I have a long running job which can be parallelized. It consists of about 650,000 units of work. I divide it into 2600 different separate subtasks, and for each of these I create a new actor:

actor {
  val range = (0L to total by limit)
  val latch = new CountDownLatch(range.length)
  range.foreach { offset =>
    actor {
      doExpensiveStuff(offset,limit)
      latch.countDown
    }
  }
  latch.await
}

This works fairly well, but overall takes 2+h to complete. The issue is that in the meanwhile, any other actors I create to do normal tasks seem to be starved out by the initial 2600 actors which are also patiently awaiting their time to be run on a thread but have been waiting longer than any new actors that come along.

How might I go about avoiding this starvation?

Initial thoughts:

Instead of 2600 actors, use one actor that sequentially plows through the large pile of work. I'm not fond of this because I'd like this job to finish sooner by splitting it up.
Instead of 2600 actors, use two actors, each processing a different half of the total work set. This might work better, but what if my machine has 8 cores? I'd likely want to utilize more than that.

UPDATE

Some folks have questioned the use of Actors at all, especially since the message passing capability was not being used within the workers. I had assumed that the Actor was a very lightweight abstraction around a ThreadPool at or near the same performance level of simply coding the ThreadPool-based execution manually. So I wrote a little benchmark:

import testing._
import java.util.concurrent._
import actors.Futures._

val count = 100000
val poolSize = 4
val numRuns = 100

val ActorTest = new Benchmark {
  def run = {
    (1 to count).map(i => future {
      i * i
    }).foreach(_())
  }
}

val ThreadPoolTest = new Benchmark {
  def run = {
    val queue = new LinkedBlockingQueue[Runnable]
    val pool = new ThreadPoolExecutor(
          poolSize, poolSize, 1, TimeUnit.SECONDS, queue)
    val latch = new CountDownLatch(count)
    (1 to count).map(i => pool.execute(new Runnable {
      override def run = {
        i * i
        latch.countDown
      }
    }))
    latch.await
  }
}

List(ActorTest,ThreadPoolTest).map { b =>
  b.runBenchmark(numRuns).sum.toDouble / numRuns
}

// List[Double] = List(545.45, 44.35)

I used the Future abstraction in the ActorTest to avoid passing a message back to another actor to signal work was done. I was surprised to find that my Actor code was over 10 times slower. Note that I also created my ThreadPoolExecutor with an initial pool size with which the default Actor pool is created.

Looking back, it seems like I've possibly overused the Actor abstraction. I'm going to look into using separate ThreadPools for these distinct, expensive, long-running tasks.

Wept answered 8/11, 2010 at 20:22 Comment(1)

Nothing about the problem as described needs actors at all. As you are just splitting work into a number of identical chunks, you can just use futures - see my answer below – Dredger 8/11, 2010 at 22:54

No matter how many actors you have, if you're not configuring your scheduling explicitly, all of them are backed with a single fork/join scheduler (running against a thread pool with capacity 4, if I'm not mistaken). That's where starvation comes from.

You should try different schedulers for your pool of actors, to find the one that shows the best performance (try ResizableThreadPoolScheduler, if you want to maximize parallelism using as much threads as possible)
You need to have a separate scheduler for the huge pool of actors (other actors in your system shouln't use it)
As it was suggested by @DaGGeRRz you may try Akka framework that offers configurable dispatchers (e.g., work stealing load balancing dispatcher moves events from mailboxes of the busy actors to the idle actors)

From the comments to default Actor implementation:

The run-time system can be configured to use a larger thread pool size (for example, by setting the actors.corePoolSize JVM property). The scheduler method of the Actor trait can be overridden to return a ResizableThreadPoolScheduler, which resizes its thread pool to avoid starvation caused by actors that invoke arbitrary blocking methods. The actors.enableForkJoin JVM property can be set to false, in which case a ResizableThreadPoolScheduler is used by default to execute actors.

In addition: an interesting thread on schedulers at scala-lang.

Kiln answered 8/11, 2010 at 22:1 Comment(2)

Vasil is right about thread usage. I incorrectly thought that actors created by the thread-block short-hand form spawned a thread per actor, but as he says, they're all running from the Scala Actor thread pool. Deleting my answer as Vasil covers it better. – Coopersmith 8/11, 2010 at 22:42

Thanks Vasil. I decided to go with a threadpool (see edit to OP) for performance in light of the fact that I didn't really need to use Actors in this case. – Wept 9/11, 2010 at 13:54

It seems from your example that you don't actually need to use actors at all, as you are not passing messages to your work units, or replying, or even looping.

Why not just create a load of Futures and then wait on them finishing? That way, the underlying Fork Join Pool is completely free to decide on the appropriate level of parallelism (i.e. # of threads) for your system:

import actors.Futures._
def mkFuture(i : Int) = future {
  doExpensiveStuff(i, limit)
}
val fs = (1 to range by limit).map(mkFuture)
awaitAll(timeout, fs) //wait on the work all finishing

Note that you are only going to benefit from parallelism by processing more tasks concurrently than your system has cores if the expensive work is not CPU-bound (maybe it's IO bound).

Dredger answered 8/11, 2010 at 22:47 Comment(4)

Futures in scala.actors.Futures are just abstractions over actors, so ultimately you get the same problem. The initial set consume all of the threads in the pool and the others get starved. If you have tasks with dramatically different behavioral characteristics (e.g. very long running vs very short running) it's a good idea to separate them. I can imagine a smarter thread pool that would automatically partition itself, but I don't know of one. – Rayfordrayle 9/11, 2010 at 1:13

Isn't an actor a rather heavyweight thing to create just to execute something on the underlying thread pool? – Dredger 9/11, 2010 at 11:13

Thanks for the reply. I had actually tried Futures as an alternative before I posted, but found the same starvation behavior due to the relationship to Actors,as Erik pointed out. – Wept 9/11, 2010 at 12:47

I updated the original question with a benchmark addendum testing the actor vs threadpool performance. You're right, and I think I'd be much better served in this case creating a separate threadpool for these tasks. thanks. – Wept 9/11, 2010 at 13:52

I have not used actors with that syntax, but by default I think all actors in scala use a thread pool.

See How to designate a thread pool for actors

Minim answered 8/11, 2010 at 20:46 Comment(1)

Yes, he wants to keep the 2600 worker actors from starving out the others he really needs to put them on separate thread pools. – Rayfordrayle 9/11, 2010 at 1:10

Recommended topics

Hot tags