I am looking for an implementation of the fork-join model for Python. As Java's ForkJoinPool, it should allow to split (fork) the work of a task into several sub tasks recursively. Once the sub tasks are completed, the results are joined and returned. Ideally, it should support threads and processes similar to the ThreadPoolExecutor and ProcessPoolExecutor in concurrent.futures, but threads are more important for now. It must allow to limit the number of threads (I want to have one thread per core). I am aware that this will only be useful if the code releases the GIL.
Example from Wikipedia to clarify fork-join model:
solve(problem):
if problem is small enough:
solve problem directly (sequential algorithm)
else:
for part in subdivide(problem)
fork subtask to solve(part)
join all subtasks spawned in previous loop
return combined results
Is there such a library in Python? I could not find one.