I understand that there is overhead in setting up the processing of a parallel Stream
, and that processing in a single thread is faster if there are few items or the processing of each item is fast.
But, is there a similar threshold for trySplit()
, a point where decomposing a problem into smaller chunks is counterproductive? I'm thinking by analogy to a merge sort switching to insertion sort for the smallest chunks.
If so, does the threshold depend on the relative cost of trySplit()
and consuming an item in the course of tryAdvance()
? Consider a split operation that's a lot more complicated than advancing an array index—splitting a lexically-ordered multiset permutation, for example. Is there a convention for letting clients specify the lower limit for a split when creating a parallel stream, depending on the complexity of their consumer? A heuristic the Spliterator
can use to estimate the lower limit itself?
Or, alternatively, is it always safe to let the lower limit of a Spliterator
be 1, and let the work-stealing algorithm take care of choosing whether to continue splitting or not?