Parallelize these nested for loops in python

Asked 6/11, 2016 at 14:51 Answered 21/12, 2016 at 19:23

python parallel-processing nested-loops multiprocess

I have a multidimensional array (result) that should be filled by some nested loops. Function fun() is a complex and time-consuming function. I want to fill my array elements in a parallel manner, so I can use all my system's processing power. Here's the code:

import numpy as np


def fun(x, y, z):
    # time-consuming computation...
    # ...

    return output


dim1 = 10
dim2 = 20
dim3 = 30

result = np.zeros([dim1, dim2, dim3])

for i in xrange(dim1):
    for j in xrange(dim2):
        for k in xrange(dim3):
            result[i, j, k] = fun(i, j, k)

My question is that "Can I parallelize this code or not? if yes, How?"

I'm using Windows 10 64-bit and python 2.7.

Please provide your solution by changing my code if you can. Thanks!

Jac answered 6/11, 2016 at 14:51 Comment(2)

The big question is whether each call of f is independent or if subsequent calls depend on the results of previous calls. If they're independent then J. Maria's answer will work. If not it will either be more complex or impossible. – Typology 6/11, 2016 at 15:40

@Typology Each call of fun() is independent of previous calls but dimensions are larger than 10, 20, 30 in real implementation and i don't want to split my indices. I want a solution that is more dynamic. – Jac 6/11, 2016 at 15:54

If you want a more general solution, taking advantage of fully parallel execution, then why not use something like this:

>>> import multiprocess as mp
>>> p = mp.Pool()
>>> 
>>> # a time consuming function taking x,y,z,...
>>> def fun(*args):
...   import time
...   time.sleep(.1)
...   return sum(*args)
... 
>>> dim1, dim2, dim3 = 10, 20, 30
>>> import itertools
>>> input = ((i,j,k) for i,j,k in itertools.combinations_with_replacement(xrange(dim3), 3) if i < dim1 and j < dim2)
>>> results = p.map(fun, input)
>>> p.close()
>>> p.join()
>>>
>>> results[:2]
[0, 1]
>>> results[-2:]
[56, 57]

Note I'm using multiprocess instead of multiprocessing, but that's only to get the ability to work in the interpreter.

I didn't use a numpy.array, but if you had to... you could just dump the output from p.map directly into a numpy.array and then modify the shape attribute to be shape = (dim1, dim2, dim3), or you could do something like this:

>>> input = ((i,j,k) for i,j,k in itertools.combinations_with_replacement(xrange(dim3), 3) if i < dim1 and j < dim2)
>>> import numpy as np
>>> results = np.empty(dim1*dim2*dim3)
>>> res = p.imap(fun, input)
>>> for i,r in enumerate(res):
...   results[i] = r
... 
>>> results.shape = (dim1,dim2,dim3)

Equilateral answered 21/12, 2016 at 19:23 Comment(0)

Here is a version of code that runs fun(i, j, k) in parallel for differend k indices. This is done by running fun in different processes by using https://docs.python.org/2/library/multiprocessing.html

import numpy as np
from multiprocessing import Pool


def fun(x, y, z):
    # time-consuming computation...
    # ...

    return output


def fun_wrapper(indices):
    fun(*indices)

if __name__ == '__main__':
    dim1 = 10
    dim2 = 20
    dim3 = 30

    result = np.zeros([dim1, dim2, dim3])

    pool = Pool(processes=8)
    for i in xrange(dim1):
        for j in xrange(dim2):
            result[i, j] = pool.map(fun_wrapper, [(i, j, k) for k in xrange(dim3)])

This is not the most elegant solution but you may start with it. And you will get a speed up only if fun contains time-consuming computation

Ochs answered 6/11, 2016 at 15:55 Comment(0)

A simple approach could be to divide the array in sections and create some threads to operate throught these sections. For example one section from (0,0,0) to (5,10,15) and other one from (5,10,16) to (10,20,30).

You can use threading module and do something like this

import numpy as np
import threading as t


def fun(x, y, z):
    # time-consuming computation...
    # ...

    return output


dim1 = 10
dim2 = 20
dim3 = 30

result = np.zeros([dim1, dim2, dim3])
#b - beginning index, e - end index
def work(ib,jb,kb,ie,je,ke):
    for i in xrange(ib,ie):
        for j in xrange(jb,je):
            for k in xrange(kb,ke):
                result[i, j, k] = fun(i, j, k)

 threads = list()
 threads.append(t.Thread(target=work, args(0,0,0,dim1/2,dim2/2,dim3/2))
 threads.append(t.Thread(target=work, args(dim1/2,dim2/2,dim3/2 +1,dim1, dim2, dim3))

 for thread in threads:
     thread.start()

You can define these sections through some algorithm and determine the number of threads dynamically. Hope it helps you or at least give you some ideas.

Ladder answered 6/11, 2016 at 15:35 Comment(2)

This won't really parallelize the code. Python threads can't run in parallel. Multiprocessing should be used. – Ochs 6/11, 2016 at 15:39

@SergeyKrivohatskiy Can you provide your solution using multiprocessing? I read a lot about it (MPI4PY and multiprocessing modules), but i can't use it. – Jac 6/11, 2016 at 15:56

Recommended topics

Hot tags