How beneficial is Parallel Seq for executing sequence of statements?
Asked Answered
W

1

5

I have a small program using List.par

val x = List(1,2,3,4,5).par.map(y => {
    Thread.sleep(2000)
    println(y)
    y + 1
})

println(x)

Output:

3
1
4
5
2
ParVector(2, 3, 4, 5, 6)

The numbers are getting printed in parallel however the return value is always keeping its order.

My aim is to execute a sequence of insert statements to SQL database in parallel.

Currently I am using for comprehension. I want to use ParSeq as number of statements are increasing.

But I am afraid whether it results in performance degradation. (If there is extra code in map implementation for preserving its order, this is a performance overhead).

Kindly suggest me how to do it.

Wasp answered 24/5, 2019 at 13:46 Comment(3)
If you want the best performances, you should probably look for SQL batch inserts rather than running each insertion in its own (short-lived) thread.Fugal
Yeah but unfortunately I can't go with batch insert as I am using same code for sql db in production and in memory db for testing.Wasp
@ShantiswarupTunga just sharing my opinion, if testing drives what you application can do, I would rethink how to test. Especially for db wrappers I came to the conclusion that the best thing is not to unit-test them, but rather run integration tests against a real db, and to simplify the process of testing for development I would recommend Testcontainers.Fregoso
E
6

Documentation ("Semantics" section) explains that there are only two possible scenarios that might lead to out-of-order behaviour:

  1. Side-effecting operations can lead to non-determinism
  2. Non-associative operations lead to non-determinism

First one you have observed yourself with the println statements. Second one is easily testable by using a non-associative binary operation such as subtraction:

val list = (1 to 100).toList
val a = list.par.reduce(_ - _)

println(a) 

Try running the above snippet a couple of times.

A list of integers can be mapped in parallel by a number of workers, because the elements don't depend on each other. Each worker can perform the operation in-place without affecting any other element. So even if it's perhaps not intuitive at first, such processing does benefit from the parallelization (but for an improvement to be noticeable you will probably need a larger number of elements).

However, that same list cannot be reduced in parallel with a non-associative operation, because the elements do depend on each other, and it makes a big difference whether you do:

1 - (2 - (3 - 4))

or

((1 - 2) - 3) - 4

This is why parallel processing of a collection usually supports reduce and fold, but not foldLeft and foldRight.

Embodiment answered 24/5, 2019 at 14:15 Comment(4)
Preserving the order is not necessary for me as it is going to be insert operations. But if after the statements got executed , it is trying to re-arrange it to its original order then this is a performance overhead. My question is it good approach to go for ParSeq in stead of For Comprehension?Wasp
You can also suggest any good approach for executing a sequence of insert statement.Wasp
Don't worry about internal implementation, it's doing stuff in place. There's no reordering because there's no shuffling. Imagine a queue of 100 people waiting in some line. Another 100 approach in parallel, each newcomer gives one person from the queue 1 dollar and leaves. Everyone in the queue now has 1 dollar more than they had, and it all happened in parallel, and the order within the queue is preserved.Embodiment
I can't comment on insert statements because it's a much bigger topic and I don't have all the details (plus to be honest it's a completely different question). But generally speaking, using .par should do the trick. And, given that talking to the DB is side-effectful, your results will most likely happen out of order anyway, so don't worry about the case of mapping the list.Embodiment

© 2022 - 2024 — McMap. All rights reserved.