>90% of the time is spent on method 'acquire' of 'thread.lock' objects

To identify the step that is using most of the computation time, I ran cProfile and got the following result:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.014    0.014  216.025  216.025 func_poolasync.py:2(<module>)
    11241  196.589    0.017  196.589    0.017 {method 'acquire' of 'thread.lock' objects}
      982    0.010    0.000  196.532    0.200 threading.py:309(wait)
     1000    0.002    0.000  196.498    0.196 pool.py:565(get)
     1000    0.005    0.000  196.496    0.196 pool.py:557(wait)
515856/3987    0.350    0.000   13.434    0.003 artist.py:230(stale)

The step on which most of the time is spent clearly seems to be method 'acquire' of 'thread.lock' objects. I have not used threading; rather I've used pool.apply_async with several processors, so I'm confused why thread.lock is the issue?

I am hoping to have some light shed on why this is the bottleneck? And how to bring down this time?

Here's how the code looks:

path='/usr/home/work'
filename='filename'

with open(path+filename+'/'+'result.pickle', 'rb') as f:
     pdata = pickle.load(f)

if __name__ == '__main__':
    pool = Pool()    
    results=[]
    data=list(range(1000))
    print('START')
    start_time = int(round(time.time()))
    result_objects = [pool.apply_async(func, args=(nomb,pdata[0],pdata[1],pdata[2])) for nomb in data]

    results = [r.get() for r in result_objects]

    pool.close()
    pool.join()
    print('END', int(round(time.time()))-start_time)

Revision:

By switching from pool.apply_async to pool.map I am able to reduce the execution time by ~3 times.

Output:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.113    0.113   70.824   70.824 func.py:2(<module>)
     4329   28.048    0.006   28.048    0.006 {method 'acquire' of 'thread.lock' objects}
        4    0.000    0.000   28.045    7.011 threading.py:309(wait)
        1    0.000    0.000   28.044   28.044 pool.py:248(map)
        1    0.000    0.000   28.044   28.044 pool.py:565(get)
        1    0.000    0.000   28.044   28.044 pool.py:557(wait)

Modification to code:

if __name__ == '__main__':
    pool = Pool()    
    data=list(range(1000))

    print('START')
    start_time = int(round(time.time()))
    funct = partial(func,pdata[0],pdata[1],pdata[2])
    results = pool.map(funct,data)
    print('END', int(round(time.time()))-start_time)

But, it has been found that some of the iterations result in nonsensical results. I'm not sure why this is happening, however, it is seen that 'method 'acquire' of 'thread.lock' objects' is still the rate determining step.

data = list(range(1000) for start_num in data: start_process(data, start_num) #this runs concurrently with many other instances of itself def start_process(data, start_num): lock(data[num]) #here, we lock num[1] on the first process #but in the second process, we're forced to wait until the first process is finished in order to lock(data[num]), because num will be equal to 1! lock(data[num+1]) lock(data[num+2]) return dothings()

for num in data[::3]: #run only on data[0], data[3] etc... #you may lock data[num], data[num+1], data[num+2] without preventing the next iteration of the loop from running for num in data[1::3]: #run only on data[1], data [4] etc for num in data[2::3]: #you get the point # of course you can forgo the 2 last loops if you only need to process the data 3 values at a time

Recommended topics

Hot tags