I am attempting to create a program in python that runs multiple instances (15) of a function simultaneously over different processors. I have been researching this, and have the below program set up using the Process tool from multiprocessing.
Unfortunately, the program executes each instance of the function sequentially (it seems to wait for one to finish before moving onto the next part of the loop).
from __future__ import print_function
from multiprocessing import Process
import sys
import os
import re
for i in range(1,16):
exec("path%d = 0" % (i))
exec("file%d = open('%d-path','a', 1)" % (i, i))
def stat(first, last):
for j in range(1,40000):
input_string = "water" + str(j) + ".xyz.geocard"
if os.path.exists('./%s' % input_string) == True:
exec("out%d = open('output%d', 'a', 1)" % (first, first))
exec('print("Processing file %s...", file=out%d)' % (input_string, first))
with open('./%s' % input_string,'r') as file:
for line in file:
for i in range(first,last):
search_string = " " + str(i) + " path:"
for result in re.finditer(r'%s' % search_string, line):
exec("path%d += 1" % i)
for i in range(first,last):
exec("print(path%d, file=file%d)" % (i, i))
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=stat, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
I am reasonably new to programming, and have no experience with parallelization - any help would be greatly appreciated.
I have included the entire program above, replacing "Some Function" with the actual function, to demonstrate that this is not a timing issue. The program can take days to cycle through all 40,000 files (each of which is quite large).