In order to make sure that the threads have a "similar" workload, it is important to find an even distribution. This is particularly important when the number of threads is "high" compared to the number of elements. For this case, one should make sure that the numbers of elements that the threads are responsible for differs by at most 1.
To achieve this, you can compute the remainder of dividing the number of elements (the array length, in your case) by the number of threads, and distribute this remainder, one by one, among the tasks.
I had the same problem a while ago. In fact, I tried to solve it in a slightly more general form, for some ParallelRangeExecutor
class, which required the computation of the start- and end indices of the intervals of an arbitrary range (which does not need to start with index 0
). The following is "extracted" from this class:
import java.util.Arrays;
public class EvenTaskDistribution
{
public static void main(String[] args)
{
test( 22, 4);
test( 21, 4);
test(100, 3);
test( 3, 4);
}
private static void test(int numElements, int parallelism)
{
int taskSizes[] = computeTaskSizes(parallelism, 0, numElements);
System.out.printf("Distributing %4d elements among %4d threads: %s\n",
numElements, parallelism, Arrays.toString(taskSizes));
}
public static int[] computeTaskSizes(
int parallelism, int globalMin, int globalMax)
{
if (parallelism <= 0)
{
throw new IllegalArgumentException(
"Parallelism must be positive, but is " + parallelism);
}
if (globalMin > globalMax)
{
throw new IllegalArgumentException(
"The global minimum may not be larger than the global " +
"maximum. Global minimum is "+globalMin+", " +
"global maximum is "+globalMax);
}
int range = globalMax - globalMin;
if (range == 0)
{
return new int[0];
}
int numTasks = Math.min(range, parallelism);
int localRange = (range - 1) / numTasks + 1;
int spare = localRange * numTasks - range;
int currentIndex = globalMin;
int taskSizes[] = new int[numTasks];
for (int i = 0; i < numTasks; i++)
{
final int min = currentIndex;
final int max = min + localRange - (i < spare ? 1 : 0);
taskSizes[i] = max - min;
currentIndex = max;
}
return taskSizes;
}
}
The output is
Distributing 22 elements among 4 threads: [5, 5, 6, 6]
Distributing 21 elements among 4 threads: [5, 5, 5, 6]
Distributing 100 elements among 3 threads: [33, 33, 34]
Distributing 3 elements among 4 threads: [1, 1, 1]
(The last one shows one of the corner cases that one might have to take into account. E.g. one could expect [1,1,1,0]
here. But this can easily be adjusted depending on the application case).