Multiprocessing - Pipe vs Queue

Asked 11/12, 2011 at 9:28 Answered 18/1 at 20:34

Solved python performance queue multiprocessing pipe

202

What are the fundamental differences between queues and pipes in Python's multiprocessing package?

In what scenarios should one choose one over the other? When is it advantageous to use Pipe()? When is it advantageous to use Queue()?

Breakaway answered 11/12, 2011 at 9:28 Comment(0)

377

What are the fundamental differences between queues and pipes in Python's multiprocessing package?

Major Edit of this answer (CY2024): concurrency

As of modern python versions if you don't need your producers and consumers to communicate, that's the only real use-case for python multiprocessing.

If you only need python concurrency, use concurrent.futures.

This example uses concurrent.futures to make four calls to do_something_slow(), which has a one-second delay. If your machine has at least four cores, running this four-second-aggregate series of function calls only takes one-second.

By default, concurrent.futures spawns workers corresponding to the number of CPU cores you have.

import concurrent.futures
import time

def do_slow_thing(input_str: str) -> str:
    """Return modified input string after a 1-second delay"""
    if isinstance(input_str, str):
        time.sleep(1)
        return "1-SECOND-DELAY " + input_str
    else:
        return "INPUT ERROR"

if __name__=="__main__":

    # Define some inputs for process pool
    all_inputs = [
        "do",
        "foo",
        "moo",
        "chew",
    ]

    # Spawn a process pool with the default number of workers...
    with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:
        # For each string in all_inputs, call do_slow_thing() 
        #    in parallel across the process worker pool
        these_futures = [executor.submit(do_slow_thing, ii) for ii in all_inputs]
        # Wait for all processes to finish
        concurrent.futures.wait(these_futures)

    # Get the results from the process pool execution... each
    # future.result() call is the return value from do_slow_thing()
    string_outputs = [future.result() for future in these_futures]
    for tmp in string_outputs:
        print(tmp)

With at least four CPU cores, you'll see this printed after roughly one-second...

$ time python stackoverflow.py
1-SECOND-DELAY do
1-SECOND-DELAY foo
1-SECOND-DELAY moo
1-SECOND-DELAY chew

real    0m1.058s
user    0m0.060s
sys     0m0.017s
$

Original Answer

At this point, the only major use-case for multiprocessing is to facilitate your producers and consumers talking to each other during execution. Most people don't need that. However, if you want communication via queue / pipes, you can find my original answer to the OP's question below (which profiles how fast they are).

The existing comments on this answer refer to the aforementioned answer below

Pooh answered 11/12, 2011 at 9:36 Comment(19)

@Jonathan "In summary Pipe() is about three times faster than a Queue()" – Odds 12/12, 2011 at 14:1

But Pipe() cannot safely be used with multiple producers/consumers. – Maroc 2/3, 2012 at 12:39

Excellent! Good answer and nice that you provided benchmarks! I only have two tiny quibbles: (1) "orders of magnitude faster" is a bit of an overstatement. The difference is x3, which is about a third of one order of magnitude. Just saying. ;-); and (2) a more fair comparison would be running N workers, each communicating with main thread via point-to-point pipe compared to performance of running N workers all pulling from a single point-to-multipoint queue. – Istanbul 31/3, 2012 at 9:28

To your "Bonus Material"... Yeah. If you're subclassing Process, put the bulk of the 'run' method in a try block. That is also a useful way to do logging of exceptions. To replicate the normal exception output: sys.stderr.write(''.join(traceback.format_exception(*(sys.exc_info())))) – Amoebocyte 28/2, 2013 at 7:33

Wouldn't it be better to send error messages through the pipe to the other process and handle errors in the other process? – Dowland 22/6, 2013 at 11:9

@Dowland - but you're going to need some out-of-band data, and associated signalling mode, to indicate that what you're sending is not regular data but error data. seeing as the originating process is already in an unpredictable state this may be too much to ask. – Balladist 27/6, 2013 at 23:52

@Mike, Just wanted to say you're awesome. This answer helped me a lot. – Unbodied 22/7, 2013 at 1:32

@Istanbul To quibble with your quibble, 3x is about half an order of magnitude, not a third -- sqrt(10) =~ 3. – Vorticella 3/8, 2013 at 2:59

In the multi-pipe.py, how can you know the pipe will be put all the items before the inp_p.close is called. – Evangelistic 23/7, 2018 at 0:7

@ideoutrea, agreed explicit is better than implicit – Pooh 24/7, 2018 at 17:1

In my tests, sending small packages (up to ~500 integer values per "send/put" "recv/get") is faster when using Queue than when using Pipe (even unidirectional with duplex=False). So if you want to go for absolute performance, check with a representative data size before using one or the other. As an example, sending messages with just 3 int is about ~40% faster with Queue. – Dowski 3/2, 2020 at 14:41

@MikePennington regarding this line in the multi-pipe code: p_output.close() # We no longer need this part of the Pipe() why is it ok to close the ouput end of the pipe? Do we have to do this? – Gabon 23/2, 2021 at 19:39

Don't see this mentioned--Queue uses pickle to convert data to binary before sending it over the pipe. Depending on what you're sending, it may be better to avoid pickle and handle the conversion yourself (for instance, if you're sending a bunch of text back and forth very quickly) – Natka 18/5, 2021 at 2:10

FYI, in my benchmarks using the same code, using Queue is only 10% slower than using Pipe on a Ubuntu server. – Sudden 3/9, 2022 at 18:31

@YouweiLiang CPU speed and python version will make the biggest difference in execution time. Doubtless your rig is much faster than my T61 – Pooh 9/9, 2022 at 16:26

@MikePennington can you also test the performance of multiprocessing.queues.SimpleQueue? – Lasting 4/11, 2022 at 2:44

Sorry, that was supposed to be multiprocessing.SimpleQueue – Lasting 4/11, 2022 at 3:4

@pepoluan, please see my latest edits for SimpleQueue() results – Pooh 14/11, 2022 at 23:24

@MikePennington Thank you! Wow, not much difference it seems. – Lasting 15/11, 2022 at 6:3

One additional feature of Queue() that is worth noting is the feeder thread. This section notes "When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe." An infinite number of (or maxsize) items can be inserted into Queue() without any calls to queue.put() blocking. This allows you to store multiple items in a Queue(), until your program is ready to process them.

Pipe(), on the other hand, has a finite amount of storage for items that have been sent to one connection, but have not been received from the other connection. After this storage is used up, calls to connection.send() will block until there is space to write the entire item. This will stall the thread doing the writing until some other thread reads from the pipe. Connection objects give you access to the underlying file descriptor. On *nix systems, you can prevent connection.send() calls from blocking using the os.set_blocking() function. However, this will cause problems if you try to send a single item that does not fit in the pipe's file. Recent versions of Linux allow you to increase the size of a file, but the maximum size allowed varies based on system configurations. You should therefore never rely on Pipe() to buffer data. Calls to connection.send could block until data gets read from the pipe somehwere else.

In conclusion, Queue is a better choice than pipe when you need to buffer data. Even when you only need to communicate between two points.

Ugaritic answered 1/6, 2020 at 22:43 Comment(3)

The section you link makes a note about a feeder thread, but the documentation of the put method still declares it a blocking or failing method: "If the optional argument block is True (the default) and timeout is None (the default), block if necessary until a free slot is available. If timeout is a positive number, it blocks at most timeout seconds and raises the queue.Full exception if no free slot was available within that time." Are you sure about your answer? – Massenet 5/9, 2020 at 20:52

I am sure about my answer. The put method will block if the maxsize parameter to the constructor of Queue is specified. But this will be because of the number of items in the queue, not the size of individual items. – Ugaritic 6/9, 2020 at 20:59

Thanks for the clarification, I had missed that part. – Massenet 6/9, 2020 at 21:3

If - like me - you are wondering whether to use a multiprocessing construct (Pipe or Queue) in your threading programs for performance, I have adapted Mike Pennington's script to compare against queue.Queue and queue.SimpleQueue:

Sending 10000 numbers to mp.Pipe():                     57.769 ms
Sending 10000 numbers to mp.Queue():                    74.844 ms
Sending 10000 numbers to mp.SimpleQueue():              66.662 ms
Sending 10000 numbers to queue.Queue():                  8.253 ms
Sending 10000 numbers to queue.SimpleQueue():            0.831 ms
Sending 100000 numbers to mp.Pipe():                   421.775 ms
Sending 100000 numbers to mp.Queue():                  812.989 ms
Sending 100000 numbers to mp.SimpleQueue():            682.442 ms
Sending 100000 numbers to queue.Queue():                82.091 ms
Sending 100000 numbers to queue.SimpleQueue():           7.831 ms
Sending 1000000 numbers to mp.Pipe():                 4198.766 ms
Sending 1000000 numbers to mp.Queue():                8302.404 ms
Sending 1000000 numbers to mp.SimpleQueue():          6845.322 ms
Sending 1000000 numbers to queue.Queue():              840.551 ms
Sending 1000000 numbers to queue.SimpleQueue():         77.338 ms
Sending 10000000 numbers to mp.Pipe():               43341.405 ms
Sending 10000000 numbers to mp.Queue():              85868.946 ms
Sending 10000000 numbers to mp.SimpleQueue():        71669.009 ms
Sending 10000000 numbers to queue.Queue():            8463.520 ms
Sending 10000000 numbers to queue.SimpleQueue():       773.727 ms

This is on an M1 MacBook Pro running Python 3.11.7.

Unsurprisingly, using the queue package yields much better results if all you have are threads. That said, I was surprised how performant queue.SimpleQueue is.

"""
pipe_performance.py
"""
import threading as td
import queue
import multiprocessing as mp
import multiprocessing.connection as mp_connection
import time
import typing


def reader_pipe(p_out: mp_connection.Connection) -> None:
    while True:
        msg = p_out.recv()
        if msg == "DONE":
            break


def reader_queue(p_queue: "queue.Queue[typing.Union[str, int]]") -> None:
    while True:
        msg = p_queue.get()
        if msg == "DONE":
            break


def pretty_print(count: int, name: str, t: float) -> None:
    text = f"Sending {count} numbers to {name}:"
    t_text = f"{t*1e3:.3f} ms"
    print(f"{text:<50}{t_text:>15}")


if __name__ == "__main__":
    for count in [10**4, 10**5, 10**6, 10**7]:
        # first: mp.pipe
        p_mppipe_out, p_mppipe_in = mp.Pipe()
        reader_p = td.Thread(target=reader_pipe, args=((p_mppipe_out),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_mppipe_in.send(ii)
        p_mppipe_in.send("DONE")
        reader_p.join()
        pretty_print(count, "mp.Pipe()", time.time() - _start)

        # second: mp.Queue
        p_mpqueue = mp.Queue()
        reader_p = td.Thread(target=reader_queue, args=((p_mpqueue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_mpqueue.put(ii)
        p_mpqueue.put("DONE")
        reader_p.join()
        pretty_print(count, "mp.Queue()", time.time() - _start)

        # third: mp.SimpleQueue
        p_mpsqueue = mp.SimpleQueue()
        reader_p = td.Thread(target=reader_queue, args=((p_mpsqueue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_mpsqueue.put(ii)
        p_mpsqueue.put("DONE")
        reader_p.join()
        pretty_print(count, "mp.SimpleQueue()", time.time() - _start)

        # fourth: queue.Queue
        p_queue = queue.Queue()
        reader_p = td.Thread(target=reader_queue, args=((p_queue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_queue.put(ii)
        p_queue.put("DONE")
        reader_p.join()
        pretty_print(count, "queue.Queue()", time.time() - _start)

        # fifth: queue.SimpleQueue
        p_squeue = queue.SimpleQueue()
        reader_p = td.Thread(target=reader_queue, args=((p_squeue),))
        reader_p.start()
        _start = time.time()
        for ii in range(0, count):
            p_squeue.put(ii)
        p_squeue.put("DONE")
        reader_p.join()
        pretty_print(count, "queue.SimpleQueue()", time.time() - _start)

Kneecap answered 3/3, 2022 at 10:52 Comment(2)

Can you also test mp.queues.SimpleQueue? Will that be faster than mp.Queue as well? Because I'm trying to optimize the performance of a multiprocessing program I'm creating, and I can only use things inside mp ... – Lasting 4/11, 2022 at 2:38

Sorry, that was supposed to be multiprocessing.SimpleQueue – Lasting 4/11, 2022 at 3:4

When using a concurrent.futures.ProcessPoolExecutor to execute your child processes in python, you cannot pass a multiprocessing.Queue as an argument. If you do, you will get an error like:

RuntimeError: Queue objects should only be shared between processes through inheritance

In this situation, one workaround is to use a multiprocessing.Manager to create a queue that you can pass to the process as an argument. However, I have found that this kind of queue is much slower than the standard multiprocessing.Queue. I have not found any benchmarks for this kind of queue so I ran them myself. I have modified Mike Pennington's test code to benchmark this Manager kind of queue.

Here are the results. I start by re-running the standard Queue test as a reference:

Sending 10000 numbers to Queue() took 0.12702512741088867 seconds
Sending 100000 numbers to Queue() took 0.9972114562988281 seconds
Sending 1000000 numbers to Queue() took 9.9016695022583 seconds

Sending 10000 numbers to manager.Queue() took 1.0181043148040771 seconds
Sending 100000 numbers to manager.Queue() took 10.438829898834229 seconds
Sending 1000000 numbers to manager.Queue() took 102.3624701499939 seconds

The results show that the queue created by the multiprocessing.Manager is approximately 10x slower than the standard multiprocessing.Queue. Pretty huge difference. Don't use this kind of queue if you care about performance.

Source code:

"""
manager_multi_queue.py
"""

from multiprocessing import Process, Queue, Manager
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    manager = Manager()
    pqueue = manager.Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to manager.Queue() took {1} seconds".format(count, (time.time() - _start)))

NEW UPDATE:

In my application I have multiple processes writing to the queue at once, and one process consuming the results. It turns out that these queues perform VERY differently in this case. The standard multiprocessing.Queue seems to become overwhelmed very easily when multiple processes are writing to it at once, and the read performance drops down by many orders of magnitude. In this situation, there are much faster alternatives to use.

Here I compare the read performance as a function of message size in bytes of three kinds of queues while the queues are being continuously written to by 5 processes. The three kinds of queues are:

multiprocessing.Queue
multiprocessing.Manager.Queue
A custom queue that uses a multiprocessing.Pipe with locks so that it can be used by many processes safely.

Click here to see a plot of the results

The results show that here is a huge difference in performance between the three kinds of queues. The fastest is the one that uses Pipes, followed by a Queue created using a Manager, followed by a standard multiprocessing.Queue. If you care about read performance while the queues are being written to, the best bet is to use a pipe or the managed queue.

Here is the sourcecode for this new test with plots:

Source code:

from __future__ import annotations

"""
queue_comparison_plots.py
"""

import asyncio
import random
from dataclasses import dataclass
from itertools import groupby
from multiprocessing import Process, Queue, Manager
import time
from matplotlib import pyplot as plt
import multiprocessing as mp

class PipeQueue():
    pipe_in: mp.connection.Connection
    pipe_out: mp.connection.Connection

    def __init__(self):
        self.pipe_out, self.pipe_in = mp.Pipe(duplex=False)
        self.write_lock = mp.Lock()
        self.read_lock = mp.Lock()

    def get(self):
        with self.read_lock:
            return self.pipe_out.recv()

    def put(self, val):
        with self.write_lock:
            self.pipe_in.send(val)

@dataclass
class Result():
    queue_type: str
    data_size_bytes: int
    num_writer_processes: int
    num_reader_processes: int
    msg_read_rate: float

class PerfTracker():
    def __init__(self):
        self.running = mp.Event()
        self.count = mp.Value("i")
        self.start_time: float | None = None
        self.end_time: float | None = None

    @property
    def rate(self) -> float:
        return (self.count.value)/(self.end_time-self.start_time)

    def update(self):
        if self.running.is_set():
            with self.count.get_lock():
                self.count.value += 1

    def start(self):
        with self.count.get_lock():
            self.count.value = 0
        self.running.set()
        self.start_time = time.time()

    def end(self):
        self.running.clear()
        self.end_time = time.time()


def reader_proc(queue, perf_tracker, num_threads = 1):
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.run_until_complete(reader_proc_async(queue, perf_tracker, num_threads))


async def reader_proc_async(queue, perf_tracker, num_threads = 1):
    async def thread(queue, perf_tracker):
        while True:
            msg = queue.get()
            perf_tracker.update()

    futures = []
    for i in range(num_threads):
        futures.append(thread(queue, perf_tracker))

    await asyncio.gather(*futures)


def writer_proc(queue, data_size_bytes: int):
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.run_until_complete(writer_proc_async(queue, data_size_bytes))


async def writer_proc_async(queue, data_size_bytes: int):
    val = random.randbytes(data_size_bytes)

    while True:
        queue.put(val)


async def main():
    num_reader_procs = 1
    num_reader_threads = 1
    num_writer_procs = 5
    test_time = 5

    results = []
    for queue_type in ["Pipe + locks", "Queue using Manager", "Queue"]:
        for data_size_bytes_order_of_magnitude in range(8):
            data_size_bytes = 10 ** data_size_bytes_order_of_magnitude
            perf_tracker = PerfTracker()

            if queue_type == "Queue using Manager":
                manager = Manager()
                pqueue = manager.Queue()
            elif queue_type == "Pipe + locks":
                pqueue = PipeQueue()
            elif queue_type == "Queue":
                pqueue = Queue()
            else:
                raise NotImplementedError()

            reader_ps = []
            for i in range(num_reader_procs):
                reader_p = Process(target=reader_proc, args=(pqueue, perf_tracker, num_reader_threads))
                reader_ps.append(reader_p)


            writer_ps = []
            for i in range(num_writer_procs):
                writer_p = Process(target=writer_proc, args=(pqueue, data_size_bytes))
                writer_ps.append(writer_p)

            for writer_p in writer_ps:
                writer_p.start()

            for reader_p in reader_ps:
                reader_p.start()

            await asyncio.sleep(1)
            print("start")
            perf_tracker.start()
            await asyncio.sleep(test_time)
            perf_tracker.end()
            print(f"Finished. {queue_type} | {data_size_bytes} |  {perf_tracker.rate} msg/sec")

            results.append(
                Result(
                    queue_type = queue_type,
                    data_size_bytes = data_size_bytes,
                    num_writer_processes = num_writer_procs,
                    num_reader_processes = num_reader_procs,
                    msg_read_rate = perf_tracker.rate,
                )
            )
            for writer_p in writer_ps:
                writer_p.kill()

            for reader_p in reader_ps:
                reader_p.kill()

    print(results)

    fig, ax = plt.subplots()

    count = 0
    for queue_type, result_iterator in groupby(results, key=lambda result: result.queue_type):
        grouped_results = list(result_iterator)
        x_coords = [x.data_size_bytes for x in grouped_results]
        y_coords = [x.msg_read_rate for x in grouped_results]
        ax.plot(x_coords, y_coords, label=f"{queue_type}")
        count += 1

    ax.set_title(f"Queue read performance comparison while writing continuously", fontsize=11)
    ax.legend(loc='upper right', fontsize=10)
    ax.set_yscale("log")
    ax.set_xscale("log")
    ax.set_xlabel("Message size (bytes)")
    ax.set_ylabel("Message throughput (messages/second)")
    plt.show()

if __name__=='__main__':
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    loop.run_until_complete(main())

Feline answered 31/1, 2023 at 21:1 Comment(0)

Short Summary

As of CY2023, the technique described in this answer is quite out of date. These days, you should use concurrent.futures.ProcessPoolExecutor()...

Regardless of the python concurrency tool you use, an answer for the OP's question is still valid, below.

Original Answer

A Pipe() can only have two endpoints.
A Queue() can have multiple producers and consumers.

When to use them

If you need more than two points to communicate, use a Queue().

If you need absolute performance, a Pipe() is much faster because Queue() is built on top of Pipe().

Performance Benchmarking

Let's assume you want to spawn two processes and send messages between them as quickly as possible. These are the timing results of a drag race between similar tests using Pipe() and Queue()...

FYI, I threw in results for SimpleQueue() and JoinableQueue() as a bonus.

JoinableQueue() accounts for tasks when queue.task_done() is called (it doesn't even know about the specific task, it just counts unfinished tasks in the queue), so that queue.join() knows the work is finished.

The code for each at bottom of this answer...

# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.9.2

$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.14316844940185547 seconds
Sending 100000 numbers to Pipe() took 1.3749017715454102 seconds
Sending 1000000 numbers to Pipe() took 14.252539157867432 seconds
$  python multi_queue.py
Sending 10000 numbers to Queue() took 0.17014789581298828 seconds
Sending 100000 numbers to Queue() took 1.7723784446716309 seconds
Sending 1000000 numbers to Queue() took 17.758610725402832 seconds
$ python multi_simplequeue.py
Sending 10000 numbers to SimpleQueue() took 0.14937686920166016 seconds
Sending 100000 numbers to SimpleQueue() took 1.5389132499694824 seconds
Sending 1000000 numbers to SimpleQueue() took 16.871352910995483 seconds
$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.15144729614257812 seconds
Sending 100000 numbers to JoinableQueue() took 1.567549228668213 seconds
Sending 1000000 numbers to JoinableQueue() took 16.237736225128174 seconds



# This is on a Thinkpad T430, VMWare running Debian 11 VM, and Python 3.7.0

(py37_test) [mpenning@mudslide ~]$ python multi_pipe.py
Sending 10000 numbers to Pipe() took 0.13469791412353516 seconds
Sending 100000 numbers to Pipe() took 1.5587594509124756 seconds
Sending 1000000 numbers to Pipe() took 14.467186689376831 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_queue.py
Sending 10000 numbers to Queue() took 0.1897726058959961 seconds
Sending 100000 numbers to Queue() took 1.7622203826904297 seconds
Sending 1000000 numbers to Queue() took 16.89015531539917 seconds
(py37_test) [mpenning@mudslide ~]$ python multi_joinablequeue.py
Sending 10000 numbers to JoinableQueue() took 0.2238149642944336 seconds
Sending 100000 numbers to JoinableQueue() took 1.4744081497192383 seconds
Sending 1000000 numbers to JoinableQueue() took 15.264554023742676 seconds


# This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

In summary:

Under python 2.7, Pipe() is about 300% faster than a Queue(). Don't even think about the JoinableQueue() unless you really must have the benefits.
Under python 3.x, Pipe() still has a (roughly 20%) edge over the Queue()s, but the performance gaps between Pipe() and Queue() are not as dramatic as they were in python 2.7. The various Queue() implementations are within roughly 15% of each other. Also my tests use integer data. Some people commented that they found performance differences in the data-types used with multiprocessing.

Bottom line for python 3.x: YMMV... consider running your own tests with your own data-types (i.e. integer / string / objects) to form conclusions about your own platforms of interest and use-cases.

I should also mention that my python3.x performance tests are inconsistent and vary somewhat. I ran multiple tests over several minutes to get the best results for each case. I suspect these differences have something to do with running my python3 tests under VMWare / virtualization; however, the virtualization diagnosis is speculation.

*** RESPONSE TO A COMMENT ABOUT TEST TECHNIQUES ***

In the comments, @JJC said:

a more fair comparison would be running N workers, each communicating with main thread via point-to-point pipe compared to performance of running N workers all pulling from a single point-to-multipoint queue.

Originally, this answer only considered the performance of one worker and one producer; that's the baseline use-case for Pipe(). Your comment requires adding different tests for multiple worker processes. While this is a valid observation for common Queue() use-cases, it could easily explode the test matrix along a completely new axis (i.e. adding tests with various numbers of worker processes).

BONUS MATERIAL 2

Multiprocessing introduces subtle changes in information flow that make debugging hard unless you know some shortcuts. For instance, you might have a script that works fine when indexing through a dictionary in under many conditions, but infrequently fails with certain inputs.

Normally we get clues to the failure when the entire python process crashes; however, you don't get unsolicited crash tracebacks printed to the console if the multiprocessing function crashes. Tracking down unknown multiprocessing crashes is hard without a clue to what crashed the process.

The simplest way I have found to track down multiprocessing crash informaiton is to wrap the entire multiprocessing function in a try / except and use traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

Now, when you find a crash you see something like:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

Source Code:

"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in range(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_simplequeue.py
"""

from multiprocessing import Process, SimpleQueue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = SimpleQueue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to SimpleQueue() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

Pooh answered 18/1 at 20:34 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Major Edit of this answer (CY2024): concurrency

Original Answer

Short Summary

Original Answer

Recommended topics

Hot tags