Why does my program hang after urllib3 logs Starting new HTTPS connection?
Asked Answered
D

1

11

I am trying to diagnose an issue where some of my celery worker processes appear to hang for several minutes. I have many tasks that make several IO calls (usually to third party APIs). In any given job, I might be making several thousand requests to various APIs. I've looked at the logs and they all have on thing in common: they hang after urllib3 makes a connection to a remote url.

At the end of my jobs (takes ~30 minutes), there are usually a few tasks that are hung.

Here is an example of the logs that I use to conclude urllib3 is the culprit:

Jul 08 04:46:26 app/worker.1:  [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] 
Jul 08 04:46:28 app/worker.1:  [DEBUG/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user 
Jul 08 04:46:28 app/worker.1:  [INFO/Worker-11] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com 

And then that's it. There is nothing logged after the Starting new HTTPS connection statement.

This is where I restarted the worker:

Jul 08 05:09:18 app/worker.1:  [INFO/MainProcess] [???(???)] celery.worker.strategy: Received task: my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] 
Jul 08 05:09:19 app/worker.1:  [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] src.aggregates.prospect.services.prospect_service: Beginning: Get social account data. provider_name: twitter, account_uid: some_user 
Jul 08 05:09:19 app/worker.1:  [DEBUG/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: "GET /v2/api_call.json?username=some_user HTTP/1.1" 403 170
Jul 08 05:09:19 app/worker.1:  [INFO/Worker-4] [my_celery_task(734a49f6-bf6b-4423-9146-1c48366ba897)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.some_api.com 
Jul 08 05:09:19 app/worker.1:  [INFO/MainProcess] [???(???)] celery.worker.job: Task my_celery_task[734a49f6-bf6b-4423-9146-1c48366ba897] succeeded in 2.265356543008238s: 32345 

In this case, I received a 403 status code. So that could be a potential culprit. However, I've seen in the logs that this has happened with status codes of 200 as well.

FWIW, I also see Resetting dropped connection: api.twitter.com quite often in the logs.

I've made sure to provide a timeout of 10 seconds everywhere I make a request using the requests library.

So it appears that the request is made but then just hangs. It's possible the remote servers are responding at a very slow pace and therefore the timeout never actually occurs, but I find this unlikely as my issue does not occur with just one specific domain.

I am using Rabbit 3.1.3 celery:3.1.11 (Cipater) kombu:3.0.16 py:3.4.0 billiard:3.3.0.17 py-amqp:1.4.5. I am using prefork.

I'm using requests==2.3.0.

So why does it appear that my tasks hang after urllib3 logs the Starting new HTTPS connection statement?

EDIT: I've added a lot of logging and provide further context below

2014-07-16T02:43:53.140381+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.strategy: Received task: [973c1361-43c3-41a2-9bcd-6ef850c41fcc]
2014-07-16T02:43:56.951211+00:00 app[worker.1]: [1;34m[DEBUG/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] : twitter, account_uid: some_user[0m
2014-07-16T02:43:56.951876+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to get request
2014-07-16T02:43:56.952005+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to prep request
2014-07-16T02:43:56.952897+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to prep request
2014-07-16T02:43:56.954133+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get response not chunked
2014-07-16T02:43:56.954249+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get a conn from the pool
2014-07-16T02:43:56.954360+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _get_conn: about to get a conn from the pool
2014-07-16T02:43:56.954482+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _get_conn: about to get a conn from the pool
2014-07-16T02:43:56.954587+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: returning conn or self new conn
2014-07-16T02:43:56.951725+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to get request
2014-07-16T02:43:56.954692+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.fullcontact.com
2014-07-16T02:43:56.954817+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: create connection class
2014-07-16T02:43:56.954948+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: just about to create connection class
2014-07-16T02:43:56.955062+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: returning conn or self new conn
2014-07-16T02:43:56.955166+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get a conn from the pool
2014-07-16T02:43:56.955290+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: about to call request
2014-07-16T02:43:56.955396+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send req
2014-07-16T02:43:56.953410+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to send
2014-07-16T02:43:56.953554+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get conn
2014-07-16T02:43:56.953986+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get conn
2014-07-16T02:43:56.956014+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: _send_output
2014-07-16T02:43:56.955517+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: put req
2014-07-16T02:43:56.955715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: put req
2014-07-16T02:43:56.956146+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send
2014-07-16T02:43:56.956309+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified conection: connect
2014-07-16T02:43:56.959157+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: set socket
2014-07-16T02:43:56.959045+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: set socket
2014-07-16T02:43:56.959295+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve cert reqs
2014-07-16T02:43:56.959402+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: resolve cert reqs
2014-07-16T02:43:56.959508+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve ssl ver
2014-07-16T02:43:56.959613+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: resolve ssl ver
2014-07-16T02:43:56.959715+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: ssl_wrap_socket
2014-07-16T02:43:56.959827+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: about to get ssl context
2014-07-16T02:43:56.960012+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: about to get ssl context
2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations


# *******************************************
# *******************************************
# This is when I restarted the job. It processed the same exact task, same parameters, same https url, etc in under 1 second.
# *******************************************
# *******************************************



2014-07-16T03:00:21.885531+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.strategy: Received task: [973c1361-43c3-41a2-9bcd-6ef850c41fcc]
2014-07-16T03:00:22.009616+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to get request
2014-07-16T03:00:22.010313+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to prep request
2014-07-16T03:00:22.016669+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: begin: request: about to send
2014-07-16T03:00:22.018419+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get conn
2014-07-16T03:00:22.051267+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: put req
2014-07-16T03:00:22.051744+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send
2014-07-16T03:00:22.051879+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified conection: connect
2014-07-16T03:00:22.072346+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: set socket
2014-07-16T03:00:22.103234+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: wrap socket with host name
2014-07-16T03:00:22.007638+00:00 app[worker.1]: [1;34m[DEBUG/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] : : twitter, account_uid: some_user[0m
2014-07-16T03:00:22.009969+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to get request
2014-07-16T03:00:22.012954+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: complete: request: about to prep request
2014-07-16T03:00:22.017254+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get conn
2014-07-16T03:00:22.049101+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to get response not chunked
2014-07-16T03:00:22.049241+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get a conn from the pool
2014-07-16T03:00:22.049361+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _get_conn: about to get a conn from the pool
2014-07-16T03:00:22.049528+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _get_conn: about to get a conn from the pool
2014-07-16T03:00:22.049639+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: returning conn or self new conn
2014-07-16T03:00:22.049756+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: Starting new HTTPS connection (1): api.fullcontact.com
2014-07-16T03:00:22.049918+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: create connection class
2014-07-16T03:00:22.050197+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: just about to create connection class
2014-07-16T03:00:22.050318+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: returning conn or self new conn
2014-07-16T03:00:22.050425+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get a conn from the pool
2014-07-16T03:00:22.050587+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: about to call request
2014-07-16T03:00:22.050777+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send req
2014-07-16T03:00:22.050946+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: put req
2014-07-16T03:00:22.051546+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: _send_output
2014-07-16T03:00:22.072581+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: set socket
2014-07-16T03:00:22.072721+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve cert reqs
2014-07-16T03:00:22.072911+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified : connect: resolve cert reqs
2014-07-16T03:00:22.073044+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: resolve ssl ver
2014-07-16T03:00:22.073179+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: resolve ssl ver
2014-07-16T03:00:22.073311+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: ssl_wrap_socket
2014-07-16T03:00:22.073445+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: about to get ssl context
2014-07-16T03:00:22.073814+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: about to get ssl context
2014-07-16T03:00:22.073979+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations
2014-07-16T03:00:22.092134+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: load verify locations
2014-07-16T03:00:22.138197+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: complete: ssl_wrap_socket: wrap socket with host name
2014-07-16T03:00:22.138207+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: ssl_wrap_socket
2014-07-16T03:00:22.138209+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: https verified : connect: match hostname
2014-07-16T03:00:22.138211+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: complete: https verified : connect: match host name
2014-07-16T03:00:22.138212+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: https verified conection: connect
2014-07-16T03:00:22.138214+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: begin: httpconection: send: before sock.sendall
2014-07-16T03:00:22.138743+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send: before sock.sendall
2014-07-16T03:00:22.138748+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send
2014-07-16T03:00:22.138796+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: _send_output
2014-07-16T03:00:22.139094+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connection: completed: httpconection: send req
2014-07-16T03:00:22.139207+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _make_request: about to call request
2014-07-16T03:00:22.139321+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: conn.get reponse with buffer
2014-07-16T03:00:22.139925+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: _make_request: conn.get reponse with no buffer
2014-07-16T03:00:22.288112+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.sessions: completest: about to send
2014-07-16T03:00:22.286453+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: _make_request: conn.get reponse with no buffer
2014-07-16T03:00:22.286596+00:00 app[worker.1]: [1;34m[DEBUG/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: "GET /v2/person.json?twitter=some_user&apiKey=5eeaece3982efd1d&style=dictionary HTTP/1.1" 403 170[0m
2014-07-16T03:00:22.286720+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: begin: urlopen: about to get response from httplib
2014-07-16T03:00:22.287050+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.connectionpool: complete: urlopen: about to get response from httplib
2014-07-16T03:00:22.287170+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to get response not chunked
2014-07-16T03:00:22.287279+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: begin: send: about to build response
2014-07-16T03:00:22.287738+00:00 app[worker.1]: [INFO/Worker-9/23] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.adapters: complete: send: about to build response
2014-07-16T03:00:24.342182+00:00 app[worker.1]: [INFO/MainProcess/2] [???(???)] celery.worker.job: Task [973c1361-43c3-41a2-9bcd-6ef850c41fcc] succeeded in 2.4521394340554252s: 44508

The last logged entry is

2014-07-16T02:43:56.960119+00:00 app[worker.1]: [INFO/Worker-28/99] [(973c1361-43c3-41a2-9bcd-6ef850c41fcc)] requests.packages.urllib3.util.ssl_: begin: ssl_wrap_socket: load verify locations

According to my log statement, the culprit is load_verify_locations.

This is the corresponding code in the requests library:

if ca_certs:
    try:
        context.load_verify_locations(ca_certs)
    # Py32 raises IOError
    # Py33 raises FileNotFoundError
    except Exception as e:  # Reraise as SSLError
        raise SSLError(e)

So it appears that context.load_verify_locations(ca_certs) is causing this to hang. Why?

Deluna answered 8/7, 2014 at 9:56 Comment(10)
Can you reproduce this hanging outside of your job queue, perhaps in a Python shell (ideally just using urllib3)? Does it hang on a specific URL?Oleoresin
No, I cannot reproduce in an isolated manner. In fact, if I simply restart my worker, it'll pick up where it left off and send a request to the offending url and will work. It does not appear to be a specific url or domain. My logs show that it hangs on various domains. However, maps.google.com occurs much more frequently.Deluna
:( Not clear if it's HTTP-related, then. Could just be something you're doing after establishing an HTTPS connection? Only thing that could hang urllib3 that I can think of is DNS resolution. If there is something quirky with your network configuration which makes resolving a domain slow, it won't timeout properly (due to limitations of the Python stdlib).Oleoresin
Do you have a stack trace from the hang? Like, an idea of where the code gets stuck?Caloyer
@KevinBurke I've updated the post to contain a more complete representation of my logs.Deluna
I have also started to see a correlation between some of the tasks that hang and the message: requests.packages.urllib3.connectionpool: Resetting dropped connection: api.twitter.com being the last thing logged before it hangs.Deluna
I'm facing something similar. urllib3 successfully establishes the connection. I can verify that because i used wireshark to inspect the TCP response and i got a SYN ACK from the web server. However, urllib3 doesnt do anything with the established connection. Netstat shows that the connection is ESTABLISHEDBarefoot
I have it, too. Another module uses urllib3 to cal a public API from my service and it causes the service to get stuck several times per day. Any workarounds?Schorl
@Schorl any success?Gregoire
@Gregoire in my case, I solved it be modifying the library and adding timeouts everywhere when it starts HTTP connections.Schorl
B
0

Are you using the multiprocessing.dummy module? i.e the thread pool?

There seems to be a bug in either the Pool module from multiprocessing.dummy or urllib3, which seemingly causes random HTTP connections to hang after they are ESTABLISHED i.e after completion of the TCP SYN-ACK cycle.

A sign of this bug happening would be when you trace the TCP request/response with Wireshark and you see a successful TCP connection being established and then a TCP RST after a few minutes, and NO application protocol traffic in between.

You may want to try switching your multiprocessing Pool code to process based, i.e use multiprocessing instead of multiprocessing.dummy.

i.e instead of

from multiprocessing.dummy import Pool as ThreadPool

use

from multiprocessing import Pool as ProcessPool

The above worked for me and the bug went away.

Barefoot answered 16/2, 2016 at 14:47 Comment(1)
I believe it's a bug in urllib3 since I am having it without any multiprocessing/threading involved.Schorl

© 2022 - 2024 — McMap. All rights reserved.