How to accelerate the application of the following for loop and function?
Asked Answered
U

3

7

I have the following for loop:

for j in range(len(list_list_int)):
    arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

I would like to apply foo with multiprocessing, mainly because it is taking a lot of time to finish. I tried to do it in batches with funcy's chunks method:

for j in chunks(1000, list_list_int):
    arr_1_, arr_2_, arr_3_ = foo(bar, list_of_ints[j])
    arr_1[j,:] = arr_1_.data.numpy()
    arr_2[j,:] = arr_2_.data.numpy()
    arr_3[j,:] = arr_3_.data.numpy()

However, I am getting list object cannot be interpreted as an integer. What is the correct way of applying foo using multiprocessing?

Unbound answered 15/5, 2019 at 5:2 Comment(8)
According to the docs and my own tests, the way you are calling it should work. Not sure why it doesn't, but you can try explicitly specifying a step (if you want default behaviour the step should have the same value as the first argument).Historicism
is there any other alternative for applying the function? @MarkUnbound
from for j in chunks(1000, list_list_int):, j is not integer, it is sublist of list_list_int, So, you need to iterate j again. i.sstatic.net/JXDmZ.pngCurrey
Thanks for the help @KingStone, could you show an example?Unbound
I updated my comment with code screen. But, chunk cannot increase speed. How about stackoverflow.com/questions/11515944Currey
I am getting a type error: list indices must be integers or slices, not list @KingStoneUnbound
Could you provide an example of how to use multiprocessing for this case? @KingStoneUnbound
What's the purpose of these lines inside a for loop? arr_1[j,:] = arr_1_.data.numpy() , they don't do anything (the arr_1 variable is overwritten in the next iteration)Washin
C
6
list_list_int = [1,2,3,4,5,6]
for j in chunks(2, list_list_int):
  for i in j:
    avg_, max_, last_ = foo(bar, i)
Currey answered 15/5, 2019 at 5:14 Comment(6)
still the same, type error list indices must be integers not slices. THANKS!Unbound
Please share list_list_int?Currey
both are the same I just change the namesUnbound
list_of_ints is a nested list of intsUnbound
list_of_ints looks like this: [[1,2],[3,4],[5,6]]Unbound
however, do you think doing batches will speed up foo aplication?Unbound
S
3

I don't have chunks installed, but from the docs I suspect it produces (for size 2 chunks, from:

alist = [[1,2],[3,4],[5,6],[7,8]]                                     
j = [[1,2],[3,4]]
j = [[5,6],[7,8]]   

which would produce an error:

In [116]: alist[j]                                                              
TypeError: list indices must be integers or slices, not list

And if your foo can't work with the full list of lists, I don't see how it will work with that list split into chunks. Apparently it can only work with one sublist at a time.

Showroom answered 16/5, 2019 at 1:43 Comment(0)
H
2

If you are looking to perform parallel operations on a numpy array, then I would use Dask.

With just a few lines of code, your operation should be able to be easily ran on multiple processes and the highly developed Dask scheduler will balance the load for you. A huge benefit to Dask compared to other parallel libraries like joblib, is that it maintains the native numpy API.

import dask.array as da

# Setting up a random array with dimensions 10K rows and 10 columns
# This data is stored distributed across 10 chunks, and the columns are kept together (1_000, 10)
x = da.random.random((10_000, 10), chunks=(1_000, 10))
x = x.persist()  # Allow the entire array to persist in memory to speed up calculation


def foo(x):
    return x / 10


# Using the native numpy function, apply_along_axis, applying foo to each row in the matrix in parallel
result_foo = da.apply_along_axis(foo, 0, x)

# View original contents
x[0:10].compute()

# View sample of results
result_foo = result_foo.compute()
result_foo[0:10]
Herewith answered 21/5, 2019 at 19:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.