Yesterday i asked a question: Reading data in parallel with multiprocess
I got very good answers, and i implemented the solution mentioned in the answer i marked as correct.
def read_energies(motif):
os.chdir("blabla/working_directory")
complx_ener = pd.DataFrame()
# complex function to fill that dataframe
lig_ener = pd.DataFrame()
# complex function to fill that dataframe
return motif, complx_ener, lig_ener
COMPLEX_ENERGIS = {}
LIGAND_ENERGIES = {}
p = multiprocessing.Pool(processes=CPU)
for x in p.imap_unordered(read_energies, peptide_kd.keys()):
COMPLEX_ENERGIS[x[0]] = x[1]
LIGAND_ENERGIES[x[0]] = x[2]
However, this solution takes the same amount of time as if i would just iterate over peptide_kd.keys()
and fill up the DataFrames
one by one. Why is that so? Is there a way to fill up the desired dicts in parallel and actually get a speed increase? i am running it on a 48 core HPC.
read_energies()
process a variable number dataframes each time would allow you to tune things to the point were there it became advantageous. – Cameliacamella