We have a script that downloads documents from various sources periodically. I'm going to move this over to celery, but while doing so, I wanted to take advantage of connection pooling at the same time, but I wasn't sure how to go about it.
My current thought is to do something like this using Requests:
import celery
import requests
s = requests.session()
@celery.task(retry=2)
def get_doc(url):
doc = s.get(url)
#do stuff with doc
But I'm concerned that the connections will stay open indefinitely.
I really only need the connections to stay open so long as I'm processing new documents.
So something like this possible:
import celery
import requests
def get_all_docs()
docs = Doc.objects.filter(some_filter=True)
s = requests.session()
for doc in docs: t=get_doc.delay(doc.url, s)
@celery.task(retry=2)
def get_doc(url):
doc = s.get(url)
#do stuff with doc
However, in this case, I'm not certain that the connection sessions will persist across instances, or if Requests will create new connections once the pickling / unpickling is complete.
Lastly, I could try the experimental support for task decorators on a class method, so something like this:
import celery
import requests
class GetDoc(object):
def __init__(self):
self.s = requests.session()
@celery.task(retry=2)
def get_doc(url):
doc = self.s.get(url)
#do stuff with doc
The last one seems like this best approach, and I'm going to test this; however, I was wondering if anyone here has already done something similar to this, or if not, one of you reading this might have a better approach than one of the above methods.