share python object between multiprocess in python3
Asked Answered
L

2

5

Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.

but , here comes a performance problem,pass message between process cost too much time (I think).

for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.

so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.

I works with windows7 / Python3.4. I had google for several days , and POSH maybe a good solution , but it can't build with python3.4

there I have 3 ways:

1.is there any way can share python object direct between process in Python3.4 ? as POSH

or

2.is it possable pass the "pointer" of an object to child process and child process can recovery the "pointer" to python object?

or

3.multiprocess.Array may be a valid solution , but if I want share complex data structure, such as list, how it works? should I make a new class base on it and provide interfaces as list?

Edit1: I tried the 3rd way,but it works worse.
I defined those value:

p_pos = multiprocessing.Value('i') #producer write position  
c_pos = multiprocessing.Value('i') #customer read position  
databuff = multiprocess.Array('c',buff_len) # shared buffer

and two function:

send_data(msg)  
get_data()

in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.

in final,it cost twice than just use pipe @_@

Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.

Lions answered 25/9, 2016 at 13:15 Comment(1)
all of I want is a queue that parent put message in to and child process get message out from ,and with out too much move operation such as Pipe or Queue in multiprocess moduleLions
E
8

I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :

I solved this kind of problem by learning C++, but it's probably not what you want to read...

Eddington answered 26/9, 2016 at 6:3 Comment(4)
Thanks . So this bottleneck is cased by python itself.And you said you "solved this kind of problem by learning C++" , it means that you rewrite all program with C++ ? or just the communication between process?Lions
Usually I only rewrite what needs to be, so as you said the communication between process, and I get back my functions using boost.python or Cython. It is hard, but only the first time ;)Eddington
@ Jean-Baptiste F. Thanks a lot , I will try the Cython.Lions
You are welcome, please tag the question as answered if you think it's the case!Eddington
A
0

To pass data (especially big numpy arrays) to a child process, I think mpi4py can be very efficient since I can work directly on buffer-like objects.

An example of using mpi4py to spawn processes and communicate (using also trio, but it is another story) can be found here.

Afterdeck answered 5/7, 2018 at 8:12 Comment(1)
thanks for your answer, it's a very old question, and the office in where I asked this question had been closed yet...but still thanks. I accessed the official page of mpi4py, it looks a bit like "fork", I will try it and measure the performance, hope it could work as good as shared memory.Lions

© 2022 - 2024 — McMap. All rights reserved.