Python: multiprocessing and requests
Asked Answered
B

3

7

Following is the snippet of code I am running to use multiprocessing which fires HTTP request in parallel. After the running on console it is getting hung at "requests.get(url)" and neither proceeding ahead nor throwing an error.

def echo_100(q):
    ...
    print "before"
    r = requests.get(url)
    print "after"
    ...
    q.put(r)

q = multiprocessing.Queue()
p = multiprocessing.Process(target=echo_100,args=(q))
p.start()
p.join()
resp = q.get()
Babbler answered 26/5, 2015 at 8:19 Comment(8)
Does requests.get return for the URI if you do it in sequence?Weathercock
Are you cleaning up the queue (i.e. issue q.get() from somewhere)?Penitence
requests.get takes a second argument auth=('user', 'pass'), are you sure you don't need it? Also, does the function work by itself, i.e. is a requests.get problem or a multiprocessing problem?Bev
@lufz: yes it is working with the main processBabbler
@nipun: it is not reaching upto q.put(). I dont think that is the issue. Anyways later in my code when all processes are finished I am dequeuing the queue.Babbler
That is a problem as you are trying to dequeue only after all the processes are finished. Can you check dequeing data from another thread in parallel? I think that will resolve the hang. Anyway check my answer below; a child process hang will result in parent process hanging on p.join()Penitence
@nipun: I have edited the code. As the flow is getting stuck at requests.get(), the queue is not even getting enqueued.Babbler
Can you put a runnable example?Penitence
A
8

On Mac OS, there seem to be some bugs reading proxy settings from the operating system. I don't know the exact details, but it sometimes causes requests to hang when using multiprocessing. You could try to circumvent the problem by disabling OS proxies entirely, like this:

session = requests.Session()
session.trust_env = False  # Don't read proxy settings from OS
r = session.get(url)

That fixed it for me.

Alonsoalonzo answered 2/10, 2016 at 22:8 Comment(0)
P
0

If you don't clean up the queue, I mean don't take items out of the queue, the process will hang after some time. The multiprocessing queues are backed by unnamed FIFOs (pipes) on Linux platforms. There is a maximum size limit for the pipes. That means, if a process writing to the pipe and no other process is reading data from pipe, after some time the writing process will hang while trying to put more data to the pipe (it may be hanging on the write system call internally).

I suspect, you are not getting item from the queue, hence the queue has become full after some time, resulting in subsequent child processes getting stalled.

Now, if the child process hangs, then the parent process may also get hung if it is trying to join (p.join()) the child (internally it will call waitpid to join the child process).

Penitence answered 26/5, 2015 at 9:17 Comment(0)
B
0

I had the exact same problem a project. I found that removing import ipdb calls in all my modules resolved the issue. I'm not sure why that import was causing this problem but eliminating those imports totally fixed it. Just having the import alone caused the problem, I wasn't even using anything from the ipdb package.

UPDATE: This happens on both Python 2.7.10 and 3.5.0 and only when I import ipdb; everything is fine if you import pdb. I've posted a related question to ask why this is happening here

Hope this resolves your issue too.

Bogus answered 23/11, 2015 at 5:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.