Using urllib3 or requests and Celery

We have a script that downloads documents from various sources periodically. I'm going to move this over to celery, but while doing so, I wanted to take advantage of connection pooling at the same time, but I wasn't sure how to go about it.

My current thought is to do something like this using Requests:

import celery
import requests

s = requests.session()

@celery.task(retry=2)
def get_doc(url):
    doc = s.get(url)
    #do stuff with doc

But I'm concerned that the connections will stay open indefinitely.

I really only need the connections to stay open so long as I'm processing new documents.

So something like this possible:

import celery
import requests


def get_all_docs()
    docs = Doc.objects.filter(some_filter=True)
    s = requests.session()
    for doc in docs: t=get_doc.delay(doc.url, s)

@celery.task(retry=2)
def get_doc(url):
    doc = s.get(url)
    #do stuff with doc

However, in this case, I'm not certain that the connection sessions will persist across instances, or if Requests will create new connections once the pickling / unpickling is complete.

Lastly, I could try the experimental support for task decorators on a class method, so something like this:

import celery
import requests


class GetDoc(object):
    def __init__(self):
        self.s = requests.session()

@celery.task(retry=2)
def get_doc(url):
    doc = self.s.get(url)
    #do stuff with doc

The last one seems like this best approach, and I'm going to test this; however, I was wondering if anyone here has already done something similar to this, or if not, one of you reading this might have a better approach than one of the above methods.

Recommended topics

Hot tags