>90% of the time is spent on method 'acquire' of 'thread.lock' objects
Asked Answered
T

1

42

To identify the step that is using most of the computation time, I ran cProfile and got the following result:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.014    0.014  216.025  216.025 func_poolasync.py:2(<module>)
    11241  196.589    0.017  196.589    0.017 {method 'acquire' of 'thread.lock' objects}
      982    0.010    0.000  196.532    0.200 threading.py:309(wait)
     1000    0.002    0.000  196.498    0.196 pool.py:565(get)
     1000    0.005    0.000  196.496    0.196 pool.py:557(wait)
515856/3987    0.350    0.000   13.434    0.003 artist.py:230(stale)

The step on which most of the time is spent clearly seems to be method 'acquire' of 'thread.lock' objects. I have not used threading; rather I've used pool.apply_async with several processors, so I'm confused why thread.lock is the issue?

I am hoping to have some light shed on why this is the bottleneck? And how to bring down this time?

Here's how the code looks:

path='/usr/home/work'
filename='filename'

with open(path+filename+'/'+'result.pickle', 'rb') as f:
     pdata = pickle.load(f)

if __name__ == '__main__':
    pool = Pool()    
    results=[]
    data=list(range(1000))
    print('START')
    start_time = int(round(time.time()))
    result_objects = [pool.apply_async(func, args=(nomb,pdata[0],pdata[1],pdata[2])) for nomb in data]

    results = [r.get() for r in result_objects]

    pool.close()
    pool.join()
    print('END', int(round(time.time()))-start_time)

Revision:

By switching from pool.apply_async to pool.map I am able to reduce the execution time by ~3 times.

Output:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.113    0.113   70.824   70.824 func.py:2(<module>)
     4329   28.048    0.006   28.048    0.006 {method 'acquire' of 'thread.lock' objects}
        4    0.000    0.000   28.045    7.011 threading.py:309(wait)
        1    0.000    0.000   28.044   28.044 pool.py:248(map)
        1    0.000    0.000   28.044   28.044 pool.py:565(get)
        1    0.000    0.000   28.044   28.044 pool.py:557(wait)

Modification to code:

if __name__ == '__main__':
    pool = Pool()    
    data=list(range(1000))

    print('START')
    start_time = int(round(time.time()))
    funct = partial(func,pdata[0],pdata[1],pdata[2])
    results = pool.map(funct,data)
    print('END', int(round(time.time()))-start_time)

But, it has been found that some of the iterations result in nonsensical results. I'm not sure why this is happening, however, it is seen that 'method 'acquire' of 'thread.lock' objects' is still the rate determining step.

Theadora answered 22/8, 2019 at 21:56 Comment(2)
were you able to understand why this is the bottleneck? I have a very similar problem.Crownwork
It might be different for you. For me, the 'func' contained plotting figures requiring the pool to create matplotlib artists each time the 'func' was called. I created the matplotlib artists outside the function and only populated them when the function was called. I refer you to this SO: #27569806, if you too have a similar situation.Theadora
G
1

While pooling is different that threading it (the difference is processes vs threads), it's still a form of parallelization that is susceptible to concurrency problems. To avoid surfacing these problems to the user, python automatically handles locks for you, using the thread module, whether or not you are actually using threading or not (because the mechanisms are fundamentally the same)

Your code is waiting a lot for its locks because each process locks the values used by the following processes. If you unwrap all the syntaxic sugar and the stuff handled by the multiprocessing module you get pseudocode:

data = list(range(1000)
for start_num in data:
    start_process(data, start_num)

#this runs concurrently with many other instances of itself
def start_process(data, start_num):
    lock(data[num])
    #here, we lock num[1] on the first process
    #but in the second process, we're forced to wait until the first process is finished in order to lock(data[num]), because num will be equal to 1!
    lock(data[num+1])
    lock(data[num+2])
    return dothings()

as you can see, each process needs the previous process to release its lock on data[n+1] and data[n+2] before its locks are able to be applied. I suspect your faster implementation circumvented this by forgoing locks, which is why you would get nonsensical results.

you may fix this (i only know that this works in python3) by using the step value on your iterator:

for num in data[::3]:
    #run only on data[0], data[3] etc...
    #you may lock data[num], data[num+1], data[num+2] without preventing the next iteration of the loop from running
for num in data[1::3]:
    #run only on data[1], data [4] etc
for num in data[2::3]:
    #you get the point
# of course you can forgo the 2 last loops if you only need to process the data 3 values at a time
Germane answered 29/4, 2024 at 15:32 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.