multiprocess or threading in python?

Asked 4/8, 2009 at 9:47 Answered 26/1, 2010 at 3:17

I have a python application that grabs a collection of data and for each piece of data in that collection it performs a task. The task takes some time to complete as there is a delay involved. Because of this delay, I don't want each piece of data to perform the task subsequently, I want them to all happen in parallel. Should I be using multiprocess? or threading for this operation?

I attempted to use threading but had some trouble, often some of the tasks would never actually fire.

Angloindian answered 4/8, 2009 at 9:47 Comment(4)

How big is your "collection of data". If it's huge, you may not want to start threads or processes for each one. – Bumpy 4/8, 2009 at 10:11

usually 1, 2, or 3 pieces of data. – Angloindian 4/8, 2009 at 10:28

@Bumpy - how would you limit the number of threads/processes to a number much smaller than the size of the data? – Bundestag 23/5, 2011 at 21:46

@Adam Greenhall: That's an unrelated question; that's what multiprocess pools are for. I'm still trying to understand this question. If there are 10,000 pieces of data, 10,000 concurrent processes (or threads) seems a really poor idea. If there are just 3, then it hardly seems worth asking, since the simplest solution is the most effective. – Bumpy 23/5, 2011 at 22:42

If you are truly compute bound, using the multiprocessing module is probably the lightest weight solution (in terms of both memory consumption and implementation difficulty.)

If you are I/O bound, using the threading module will usually give you good results. Make sure that you use thread safe storage (like the Queue) to hand data to your threads. Or else hand them a single piece of data that is unique to them when they are spawned.

PyPy is focused on performance. It has a number of features that can help with compute-bound processing. They also have support for Software Transactional Memory, although that is not yet production quality. The promise is that you can use simpler parallel or concurrent mechanisms than multiprocessing (which has some awkward requirements.)

Stackless Python is also a nice idea. Stackless has portability issues as indicated above. Unladen Swallow was promising, but is now defunct. Pyston is another (unfinished) Python implementation focusing on speed. It is taking an approach different to PyPy, which may yield better (or just different) speedups.

Van answered 4/8, 2009 at 12:26 Comment(0)

Tasks runs like sequentially but you have the illusion that are run in parallel. Tasks are good when you use for file or connection I/O and because are lightweights.

Multiprocess with Pool may be the right solution for you because processes runs in parallel so are very good with intensive computing because each process run in one CPU (or core).

Setup multiprocess may be very easy:

from multiprocessing import Pool

def worker(input_item):
    output = do_some_work()
    return output

pool = Pool() # it make one process for each CPU (or core) of your PC. Use "Pool(4)" to force to use 4 processes, for example.
list_of_results = pool.map(worker, input_list) # Launch all automatically

Inflated answered 26/1, 2010 at 3:13 Comment(1)

does it mean that all cores working on the same data? is it possible to split input_list and pass each chunk to different cores? – Rigid 10/4, 2013 at 10:29

For small collections of data, simply create subprocesses with subprocess.Popen.

Each subprocess can simply get it's piece of data from stdin or from command-line arguments, do it's processing, and simply write the result to an output file.

When the subprocesses have all finished (or timed out), you simply merge the output files.

Very simple.

Bumpy answered 4/8, 2009 at 10:31 Comment(4)

This is a really heavy solution. Not only do you have to arrange to feed the data to an external process, you have massive overhead. – Van 4/8, 2009 at 12:21

@Christopher. The point is simplicity. The Unix world has been using this technique for 40 years. It works well because it's simple. Also, the overhead isn't really "massive" since you're running multiple instances of the same binary image. This is well optimized by GNU/Linux. – Bumpy 4/8, 2009 at 12:39

@S.Lott: Just because it has been used for a long time, doesn't mean it is a good solution. It is especially not a good solution for compute-bound problems. The overhead is "massive" because you have the memory overhead of all the per-process structures, as well as the latency of multiple kernel transitions. The python multiprocessing module does not really create a new "process" like subprocess does. It creates a new interpreter context, which is far lighter than creating a new OS-level process. – Van 4/8, 2009 at 14:57

@Christopher: all true. Using subprocess is simpler. Not "better" in some undefined way. It is unlikely to be faster. Sometimes it actually is faster because the overheads are have more impact during process startup. The point is simply that multiple subprocesses is often simpler. – Bumpy 4/8, 2009 at 15:28

You might consider looking into Stackless Python. If you have control over the function that takes a long time, you can just throw some stackless.schedule()s in there (saying yield to the next coroutine), or else you can set Stackless to preemptive multitasking.

In Stackless, you don't have threads, but tasklets or greenlets which are essentially very lightweight threads. It works great in the sense that there's a pretty good framework with very little setup to get multitasking going.

However, Stackless hinders portability because you have to replace a few of the standard Python libraries -- Stackless removes reliance on the C stack. It's very portable if the next user also has Stackless installed, but that will rarely be the case.

Humboldt answered 4/8, 2009 at 11:6 Comment(0)

Using CPython's threading model will not give you any performance improvement, because the threads are not actually executed in parallel, due to the way garbage collection is handled. Multiprocess would allow parallel execution. Obviously in this case you have to have multiple cores available to farm out your parallel jobs to.

There is much more information available in this related question.

Barbabra answered 4/8, 2009 at 10:10 Comment(2)

This is not true. It will not give you AS MUCH performance improvement as it will in, say, C or C++, but some concurrency does occur. Especially if you are I/O bound do threads help. – Van 4/8, 2009 at 12:20

I hadn't realised that - thanks for the info. Here's an external refernce: mail.python.org/pipermail/python-dev/2008-May/079461.html. In this benchmark, you can see the improvement for I/O-bound problems that you describe. However, it is worth pointing out that the CPU-bound problem actually ran more slowly with 2 Python threads than with 1! It seems profiling for your application is essential. – Barbabra 4/8, 2009 at 12:47

If you can easily partition and separate the data you have, it sounds like you should just do that partitioning externally, and feed them to several processes of your program. (i.e. several processes instead of threads)

Almondeyed answered 4/8, 2009 at 13:45 Comment(0)

IronPython has real multithreading, unlike CPython and it's GIL. So depending on what you're doing it may be worth looking at. But it sounds like your use case is better suited to the multiprocessing module.

To the guy who recommends stackless python, I'm not an expert on it, but it seems to me that he's talking about software "multithreading", which is actually not parallel at all (still runs in one physical thread, so cannot scale to multiple cores.) It's merely an alternative way to structure asynchronous (but still single-threaded, non-parallel) application.

Treharne answered 10/8, 2009 at 1:31 Comment(0)

You may want to look at Twisted. It is designed for asynchronous network tasks.

Malone answered 26/1, 2010 at 3:17 Comment(0)

Recommended topics

Hot tags