To identify the step that is using most of the computation time, I ran cProfile and got the following result:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.014 0.014 216.025 216.025 func_poolasync.py:2(<module>)
11241 196.589 0.017 196.589 0.017 {method 'acquire' of 'thread.lock' objects}
982 0.010 0.000 196.532 0.200 threading.py:309(wait)
1000 0.002 0.000 196.498 0.196 pool.py:565(get)
1000 0.005 0.000 196.496 0.196 pool.py:557(wait)
515856/3987 0.350 0.000 13.434 0.003 artist.py:230(stale)
The step on which most of the time is spent clearly seems to be method 'acquire' of 'thread.lock' objects
. I have not used threading; rather I've used pool.apply_async
with several processors, so I'm confused why thread.lock
is the issue?
I am hoping to have some light shed on why this is the bottleneck? And how to bring down this time?
Here's how the code looks:
path='/usr/home/work'
filename='filename'
with open(path+filename+'/'+'result.pickle', 'rb') as f:
pdata = pickle.load(f)
if __name__ == '__main__':
pool = Pool()
results=[]
data=list(range(1000))
print('START')
start_time = int(round(time.time()))
result_objects = [pool.apply_async(func, args=(nomb,pdata[0],pdata[1],pdata[2])) for nomb in data]
results = [r.get() for r in result_objects]
pool.close()
pool.join()
print('END', int(round(time.time()))-start_time)
Revision:
By switching from pool.apply_async
to pool.map
I am able to reduce the execution time by ~3 times.
Output:
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.113 0.113 70.824 70.824 func.py:2(<module>)
4329 28.048 0.006 28.048 0.006 {method 'acquire' of 'thread.lock' objects}
4 0.000 0.000 28.045 7.011 threading.py:309(wait)
1 0.000 0.000 28.044 28.044 pool.py:248(map)
1 0.000 0.000 28.044 28.044 pool.py:565(get)
1 0.000 0.000 28.044 28.044 pool.py:557(wait)
Modification to code:
if __name__ == '__main__':
pool = Pool()
data=list(range(1000))
print('START')
start_time = int(round(time.time()))
funct = partial(func,pdata[0],pdata[1],pdata[2])
results = pool.map(funct,data)
print('END', int(round(time.time()))-start_time)
But, it has been found that some of the iterations result in nonsensical results. I'm not sure why this is happening, however, it is seen that 'method 'acquire' of 'thread.lock' objects' is still the rate determining step.