I have a huge list that I need to process, which takes some time, so I divide it into 4 pieces and multiprocess each piece with some function. It still takes a bit of time to run with 4 cores, so I figured I would add some progress bar to the function, so that it could tell me where each processor is at in processing the list.
My dream was to have something like this:
erasing close atoms, cpu0 [######..............................] 13%
erasing close atoms, cpu1 [#######.............................] 15%
erasing close atoms, cpu2 [######..............................] 13%
erasing close atoms, cpu3 [######..............................] 14%
with each bar moving as the loop in the function progresses. But instead, I get a continuous flow:
etc, filling my terminal window.
Here is the main python script that calls the function:
from eraseCloseAtoms import *
from readPDB import *
import multiprocessing as mp
from vectorCalc import *
prot, cell = readPDB('file')
atoms = vectorCalc(cell)
output = mp.Queue()
# setup mp to erase grid atoms that are too close to the protein (dmin = 2.5A)
cpuNum = 4
tasks = len(atoms)
rangeSet = [tasks / cpuNum for i in range(cpuNum)]
for i in range(tasks % cpuNum):
rangeSet[i] += 1
rangeSet = np.array(rangeSet)
processes = []
for c in range(cpuNum):
na, nb = (int(np.sum(rangeSet[:c] + 1)), int(np.sum(rangeSet[:c + 1])))
processes.append(mp.Process(target=eraseCloseAtoms, args=(prot, atoms[na:nb], cell, 2.7, 2.5, output)))
for p in processes:
p.start()
results = [output.get() for p in processes]
for p in processes:
p.join()
atomsNew = results[0] + results[1] + results[2] + results[3]
Below is the function eraseCloseAtoms()
:
import numpy as np
import click
def eraseCloseAtoms(protein, atoms, cell, spacing=2, dmin=1.4, output=None):
print 'just need to erase close atoms'
if dmin > spacing:
print 'the spacing needs to be larger than dmin'
return
grid = [int(cell[0] / spacing), int(cell[1] / spacing), int(cell[2] / spacing)]
selected = list(atoms)
with click.progressbar(length=len(atoms), label='erasing close atoms') as bar:
for i, atom in enumerate(atoms):
bar.update(i)
erased = False
coord = np.array(atom[6])
for ix in [-1, 0, 1]:
if erased:
break
for iy in [-1, 0, 1]:
if erased:
break
for iz in [-1, 0, 1]:
if erased:
break
for j in protein:
protCoord = np.array(protein[int(j)][6])
trueDist = getMinDist(protCoord, coord, cell, vectors)
if trueDist <= dmin:
selected.remove(atom)
erased = True
break
if output is None:
return selected
else:
output.put(selected)