I want to use a limited number of threads (at most 2
) to run a function in a class for removing some files on disk in the background. The rest of my code within the same class is independent of this background function and might get executed tens of times more than the background function. However, I still need to enforce the core/thread limit. So it is possible that the background jobs might exceed 2
and I need to queue them. Note that my background function does not take any arguments.
I am pretty new to multi-threading and multi-processing but I think I have done my homework and looked at many posts here on Stack Overflow and tried a couple of approaches. However, none of those approaches seems to work for me. Here's the structure of my code:
class myClass(object):
def __init__(self):
#some stuff
def backgroundFunc(self):
# delete some files on disk
def mainFunc(self, elem):
# Do some other things
self.backgroundFunc() #I want to run this in the background
Here's how I run the code
import myClass
myClassInstance = myClass()
For element in someList:
myClassInstance.mainFunc(elem=element)
Note that I cannot start the background job before the stuff in mainFunc
has started running.
And here is my first try with threading
in my class file:
from threading import Thread
class myClass(object):
def __init__(self):
#some stuff
def backgroundFunc(self):
# delete some files on disk
def mainFunc(self, elem):
# Do some other things
thr = Thread(target=self.backgroundFunc)
thr.start()
However, the problem with this approach is that the program crashes at random times; sometimes right at the beginning of prpgram execution and sometimes later the erro messages are also different every time. I guess it's possibly because threads do not block a piece of memory and things might be being written/read from those memory cells. Or, unlikely, maybe this is because I am running my code on a server and there are some limitations enforced from the server on the allocated resources. In addition, I cannot set a limit on the number of threads and cannot do queuing, in case mainFunc
code gets executed more than twice while I already have two background jobs running.
Here's another try with multiprocessing.Process
:
from multiprocessing import Process
class myClass(object):
def __init__(self):
#some stuff
def backgroundFunc(self):
# delete some files on disk
def mainFunc(self, elem):
# Do some other things
p = Process(target=self.backgroundFunc)
p.start()
The problem with this approach is that Process will use as many threads/cores that my machine has in its disposal and since the rest of my code automatically is run in parallel, everything becomes super slow very quickly.
I eventually arrived at multiprocessing.Pool
but I am still pretty confused on how I can use it effectively. Anyways, here's my try with Pool
:
from multiprocessing import Pool
class myClass(object):
def __init__(self):
#some stuff
self.pool = Pool(processes=2)
def backgroundFunc(self):
# delete some files on disk
print('some stuff')
def mainFunc(self, elem):
# Do some other things
self.pool.apply_async(self.backgroundFunc)
However, apply_async
seems not to work. None of the print
statements that I have in the backgroundFunc
print anything on the screen. I added self.pool.close()
after apply_async
but I get some errors soon after the second processes start. I tried using things like self.pool.apply
and some others but it seems that they require a function that takes limited arguments. But my backgroundFunc
does not take any arguments. Finally, I do not know how I can do the queuing
that I explained earlier using Pool.
Also, I want to have control over how many times and when I want to run backgroundFunc
. Also, mainFunc
should not wait for all threads to finish running before it exits. If that happens, I won't benefit from multi threading because the background function might take too long to finish. Maybe I should have been more clear in the question; sorry about that.
So I would really appreciate if someone can help me with this. I am pretty confused. Thanks in advance!
multiprocessing.Process
but for some reason things become drastically slow very quickly for the other operations in the main function. I don't know any better solution. – Kaliwatchdog
to react to filesystem events. Then, instead of callingbackgroundFunc
multiple times, you could have one process reacting to filesystem events continually until your program terminates. The filesystem-monitoring process won't consume much CPU when there is no filesystem activity. – SwallowbackgorundFunc
I meant do not want the behavior ofProcess
that you do not have any control over the things (e.g.self
variables) insidebackgroundFunc
when running. Also, you cannot make sure only two background jobs are running.. I'll look into the things you mentioned here. Will get back to you sometime soon – Kalitime.sleep(1)
may suffice) but we do need to see the structure of your parallel operations and overall program. A small runnable example may enable us to help you better. – Swallow