Retrieve list of tasks in a queue in Celery
Asked Answered
A

19

221

How can I retrieve a list of tasks in a queue that are yet to be processed?

Add answered 4/4, 2011 at 21:35 Comment(1)
RabbitMQ, but I want to retrieve this list inside Python.Add
D
242

EDIT: See other answers for getting a list of tasks in the queue.

You should look here: Celery Guide - Inspecting Workers

Basically this:

my_app = Celery(...)

# Inspect all nodes.
i = my_app.control.inspect()

# Show the items that have an ETA or are scheduled for later processing
i.scheduled()

# Show tasks that are currently active.
i.active()

# Show tasks that have been claimed by workers
i.reserved()

Depending on what you want

Deprived answered 20/2, 2012 at 22:35 Comment(14)
I tried that, but it's realy slow (like 1 sec). I'm using it syncrhonously in a tornado app to monitor progress, so it has to be fast.Boogeyman
This will not return a list of tasks in the queue that have yet to be processed.Discrimination
Use i.reserved() to get a list of queued tasks.Misdemeanor
Has anybody experienced that i.reserved() won't have an accurate list of active tasks? I have tasks running that don't show up in the list. I'm on django-celery==3.1.10Cinderellacindi
@JulienFr if you use the name of the worker when inspecting, it will take a sec instead of a minute. i.e i = inspect('celery@mysite')Cinderellacindi
@Banana - Doesn't reserved() only show tasks that have been prefetched by the workers? This wont show the entire queue, right? What if I've disabled prefetching? See: docs.celeryproject.org/en/latest/userguide/…Ammonate
@Ammonate - yes, reserved() only shows prefetched tasks, it seems (even if prefetch multiplier is 1). To get stats on messages still in your broker queues and not yet retrieved by Celery, you need to use the amqplib or rabbitmqctl techniques mentioned in other answers.Lazarus
When specifying the worker I had to use a list as argument: inspect(['celery@Flatty']). Huge speed improvement over inspect().Demobilize
@Seperman: From my limited, current understanding, this seems related to the worker_prefetch_multiplier setting of celery. When I increased the concurrency of a queue, more pending tasks appeared than when using a lower concurrency. This seems to be in-line with docs.celeryproject.org/en/latest/userguide/…Sycosis
This does not answer the question. I have no idea why is this answer accepted... :)Guilt
This used to work for me with Celery 3.*, but no longer works with Celery 4.*. Even with a long task actively running, this returns empty lists.Gallaway
It return None for me tooSchwaben
from celery.task.control import inspect with small i worked for meSamp
This is an incorrect answer. Celery only reports on tasks which have been dispatched to a worker. The question is about tasks in the backlog before a worker has picked them up. To answer this, you need to inspect the message broker (rediis or rabbitmq)Jeremiad
H
67

If you are using Celery+Django simplest way to inspect tasks using commands directly from your terminal in your virtual environment or using a full path to celery:

Doc: http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#inspecting-workers

$ celery inspect reserved
$ celery inspect active
$ celery inspect registered
$ celery inspect scheduled

Also if you are using Celery+RabbitMQ you can inspect the list of queues using the following command:

More info: https://linux.die.net/man/1/rabbitmqctl

$ sudo rabbitmqctl list_queues
Hotel answered 11/4, 2019 at 11:48 Comment(3)
If you have a define project, you can use celery -A my_proj inspect reservedGennagennaro
This, again, does not answer the question.Guilt
I'm here because my Celery server is currently overloaded with tasks. This approach doesn't help, because it just hangs. So do things like app.control.inspect().active(), from another answer. I just want to kill some jobs so my server's functional again...Prolusion
C
56

if you are using rabbitMQ, use this in terminal:

sudo rabbitmqctl list_queues

it will print list of queues with number of pending tasks. for example:

Listing queues ...
0b27d8c59fba4974893ec22d478a7093    0
0e0a2da9828a48bc86fe993b210d984f    0
[email protected] 0
11926b79e30a4f0a9d95df61b6f402f7    0
15c036ad25884b82839495fb29bd6395    1
[email protected]    0
celery  166
celeryev.795ec5bb-a919-46a8-80c6-5d91d2fcf2aa   0
celeryev.faa4da32-a225-4f6c-be3b-d8814856d1b6   0

the number in right column is number of tasks in the queue. in above, celery queue has 166 pending task.

Countable answered 9/4, 2015 at 11:55 Comment(2)
I am familiar with this when I have sudo privileges, but I want an unprivileged, system user to be able to check - any suggestions?Dm
In addition you can pipe this through grep -e "^celery\s" | cut -f2 to extract that 166 if you want to process that number later, say for stats.Keaton
W
39

If you don't use prioritized tasks, this is actually pretty simple if you're using Redis. To get the task counts:

redis-cli -h HOST -p PORT -n DATABASE_NUMBER llen QUEUE_NAME

But, prioritized tasks use a different key in redis, so the full picture is slightly more complicated. The full picture is that you need to query redis for every priority of task. In python (and from the Flower project), this looks like:

PRIORITY_SEP = '\x06\x16'
DEFAULT_PRIORITY_STEPS = [0, 3, 6, 9]


def make_queue_name_for_pri(queue, pri):
    """Make a queue name for redis
    
    Celery uses PRIORITY_SEP to separate different priorities of tasks into
    different queues in Redis. Each queue-priority combination becomes a key in
    redis with names like:
    
     - batch1\x06\x163 <-- P3 queue named batch1
     
    There's more information about this in Github, but it doesn't look like it 
    will change any time soon:
     
      - https://github.com/celery/kombu/issues/422
      
    In that ticket the code below, from the Flower project, is referenced:
    
      - https://github.com/mher/flower/blob/master/flower/utils/broker.py#L135
        
    :param queue: The name of the queue to make a name for.
    :param pri: The priority to make a name with.
    :return: A name for the queue-priority pair.
    """
    if pri not in DEFAULT_PRIORITY_STEPS:
        raise ValueError('Priority not in priority steps')
    return '{0}{1}{2}'.format(*((queue, PRIORITY_SEP, pri) if pri else
                                (queue, '', '')))


def get_queue_length(queue_name='celery'):
    """Get the number of tasks in a celery queue.
    
    :param queue_name: The name of the queue you want to inspect.
    :return: the number of items in the queue.
    """
    priority_names = [make_queue_name_for_pri(queue_name, pri) for pri in
                      DEFAULT_PRIORITY_STEPS]
    r = redis.StrictRedis(
        host=settings.REDIS_HOST,
        port=settings.REDIS_PORT,
        db=settings.REDIS_DATABASES['CELERY'],
    )
    return sum([r.llen(x) for x in priority_names])

If you want to get an actual task, you can use something like:

redis-cli -h HOST -p PORT -n DATABASE_NUMBER lrange QUEUE_NAME 0 -1

From there you'll have to deserialize the returned list. In my case I was able to accomplish this with something like:

r = redis.StrictRedis(
    host=settings.REDIS_HOST,
    port=settings.REDIS_PORT,
    db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
pickle.loads(base64.b64decode(json.loads(l[0])['body']))

Just be warned that deserialization can take a moment, and you'll need to adjust the commands above to work with various priorities.

Woodrum answered 15/4, 2017 at 0:2 Comment(7)
After using this in production, I've learned that it fails if you use prioritized tasks, due to the design of Celery.Woodrum
I've updated the above to handle prioritized tasks. Progress!Woodrum
Just to spell things out, the DATABASE_NUMBER used by default is 0, and the QUEUE_NAME is celery, so redis-cli -n 0 llen celery will return the number of queued messages.Swab
For my celery, the name of the queue is '{{{0}}}{1}{2}' instead of '{0}{1}{2}'. Other than that, this works perfectly!Unveil
It always return 0.Longsighted
Yep, this answers the question only if the broker is Redis.Guilt
The problem I experience with this solution : if you revoke a celery task that is waiting in the queue, it stays in the redis queue. And number of tasks returned by lrange is not correct.Nieshanieto
D
16

To retrieve tasks from backend, use this

from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672 ", userid="guest",
                       password="guest", virtual_host="/", insist=False)
chan = conn.channel()
name, jobs, consumers = chan.queue_declare(queue="queue_name", passive=True)
Downbeat answered 19/10, 2013 at 11:43 Comment(2)
but 'jobs' gives only number of tasks in queueMyocarditis
See https://mcmap.net/q/118389/-retrieve-list-of-tasks-in-a-queue-in-celery for related answer that gives you the names of the tasks.Martinmartina
A
15

A copy-paste solution for Redis with json serialization:

def get_celery_queue_items(queue_name):
    import base64
    import json  

    # Get a configured instance of a celery app:
    from yourproject.celery import app as celery_app

    with celery_app.pool.acquire(block=True) as conn:
        tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
        decoded_tasks = []

    for task in tasks:
        j = json.loads(task)
        body = json.loads(base64.b64decode(j['body']))
        decoded_tasks.append(body)

    return decoded_tasks

It works with Django. Just don't forget to change yourproject.celery.

Atelier answered 4/5, 2018 at 22:1 Comment(2)
If you're using the pickle serializer, then you can change the body = line to body = pickle.loads(base64.b64decode(j['body'])).Corporator
i have this error ! module 'celery.app' has no attribute 'pool'Episodic
M
13

This worked for me in my application:

def get_queued_jobs(queue_name):
    connection = <CELERY_APP_INSTANCE>.connection()

    try:
        channel = connection.channel()
        name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
        active_jobs = []

        def dump_message(message):
            active_jobs.append(message.properties['application_headers']['task'])

        channel.basic_consume(queue=queue_name, callback=dump_message)

        for job in range(jobs):
            connection.drain_events()

        return active_jobs
    finally:
        connection.close()

active_jobs will be a list of strings that correspond to tasks in the queue.

Don't forget to swap out CELERY_APP_INSTANCE with your own.

Thanks to @ashish for pointing me in the right direction with his answer here: https://mcmap.net/q/118389/-retrieve-list-of-tasks-in-a-queue-in-celery

Martinmartina answered 5/9, 2019 at 14:41 Comment(11)
in my case jobs is always zero... any idea?Lomax
@Lomax I don't think that's enough information for me to respond helpfully. You could open your own question. I don't think it would be a duplicate of this one if you specify that you want to retrieve the information in python. I'd go back to https://mcmap.net/q/118389/-retrieve-list-of-tasks-in-a-queue-in-celery, which is what I based my answer off of, and make sure that works first.Martinmartina
@CalebSyring This is the first approach that really shows me the queued tasks. Very nice. The only problem for me is that the list append does not seem to work. Any ideas how i can make the callback function write to the list?Ashby
@Ashby I'm sorry, someone made an improper edit to my answer. You can look in the edit history for the original answer, which will most likely work for you. I'm working on getting this fixed. (EDIT: I just went in and rejected the edit, which had an obvious python error. Let me know if this fixed your problem or not.)Martinmartina
@CalebSyring I now used your code in a class, having the list as a class attribute works!Ashby
@CalebSyring Is there a difference in open the connection with "with connection" to the version where i open and close it manually? Because I use this code in a script where this command is executed a lot of times, and it seems like I got OSErrors with too many opened filesAshby
@Ashby it doesn't look like there is a "with connection" in the code above. Not sure how I can help. I would say the example above opens and closes the connection manually.Martinmartina
@CalebSyring Oh, was the code edited? I thought there was a witch connection. Maybe i will try this one, thanks:)Ashby
@CalebSyring this is brilliant answer! Thanks! Any idea how we can get only a limited number of results? I have a queue with millions of tasks and I want to check a few of them to see what's in there. I tried to change the range but that didn't help.Dippold
You could check the length of active_jobs in the dump_message function and only append a task, if the active_jobs list has less elements than you desire to have.Haupt
Thank you for this answer. Is there a way to do this non-destructively? I mean, this works for me in the sense that it returns a list of the jobs, but the "drain_events" method seems to convert the jobs in the queue to being "done", so that they are not available to workers or later inspection.Zins
H
7

The celery inspect module appears to only be aware of the tasks from the workers perspective. If you want to view the messages that are in the queue (yet to be pulled by the workers) I suggest to use pyrabbit, which can interface with the rabbitmq http api to retrieve all kinds of information from the queue.

An example can be found here: Retrieve queue length with Celery (RabbitMQ, Django)

Huntsman answered 30/8, 2016 at 14:26 Comment(0)
S
5

I think the only way to get the tasks that are waiting is to keep a list of tasks you started and let the task remove itself from the list when it's started.

With rabbitmqctl and list_queues you can get an overview of how many tasks are waiting, but not the tasks itself: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html

If what you want includes the task being processed, but are not finished yet, you can keep a list of you tasks and check their states:

from tasks import add
result = add.delay(4, 4)

result.ready() # True if finished

Or you let Celery store the results with CELERY_RESULT_BACKEND and check which of your tasks are not in there.

Station answered 13/4, 2011 at 10:36 Comment(0)
G
5

As far as I know Celery does not give API for examining tasks that are waiting in the queue. This is broker-specific. If you use Redis as a broker for an example, then examining tasks that are waiting in the celery (default) queue is as simple as:

  1. connect to the broker
  2. list items in the celery list (LRANGE command for an example)

Keep in mind that these are tasks WAITING to be picked by available workers. Your cluster may have some tasks running - those will not be in this list as they have already been picked.

The process of retrieving tasks in particular queue is broker-specific.

Guilt answered 4/5, 2018 at 8:48 Comment(0)
R
2

I've come to the conclusion the best way to get the number of jobs on a queue is to use rabbitmqctl as has been suggested several times here. To allow any chosen user to run the command with sudo I followed the instructions here (I did skip editing the profile part as I don't mind typing in sudo before the command.)

I also grabbed jamesc's grep and cut snippet and wrapped it up in subprocess calls.

from subprocess import Popen, PIPE
p1 = Popen(["sudo", "rabbitmqctl", "list_queues", "-p", "[name of your virtula host"], stdout=PIPE)
p2 = Popen(["grep", "-e", "^celery\s"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["cut", "-f2"], stdin=p2.stdout, stdout=PIPE)
p1.stdout.close()
p2.stdout.close()
print("number of jobs on queue: %i" % int(p3.communicate()[0]))
Rayerayfield answered 16/11, 2017 at 4:23 Comment(0)
H
2

If you control the code of the tasks then you can work around the problem by letting a task trigger a trivial retry the first time it executes, then checking inspect().reserved(). The retry registers the task with the result backend, and celery can see that. The task must accept self or context as first parameter so we can access the retry count.

@task(bind=True)
def mytask(self):
    if self.request.retries == 0:
        raise self.retry(exc=MyTrivialError(), countdown=1)
    ...

This solution is broker agnostic, ie. you don't have to worry about whether you are using RabbitMQ or Redis to store the tasks.

EDIT: after testing I've found this to be only a partial solution. The size of reserved is limited to the prefetch setting for the worker.

Higbee answered 14/10, 2018 at 19:21 Comment(0)
I
2
inspector = current_celery_app.control.inspect()
scheduled = list(inspector.scheduled().values())[0]
active = list(inspector.active().values())[0]
reserved = list(inspector.reserved().values())[0]
registered = list(inspector.registered().values())[0]
lst = [*scheduled, *active, *reserved]
for i in lst:
    if job_id == i['id']:
        print("Job found")
Iatry answered 29/6, 2023 at 13:47 Comment(0)
T
1
from celery.task.control import inspect
def key_in_list(k, l):
    return bool([True for i in l if k in i.values()])

def check_task(task_id):
    task_value_dict = inspect().active().values()
    for task_list in task_value_dict:
        if self.key_in_list(task_id, task_list):
             return True
    return False
Tendance answered 21/7, 2019 at 6:31 Comment(1)
For Celery > 5, you can try: from your_app.celery import app and then for example: app.control.inspect().active()Ballottement
G
0

With subprocess.run:

import subprocess
import re
active_process_txt = subprocess.run(['celery', '-A', 'my_proj', 'inspect', 'active'],
                                        stdout=subprocess.PIPE).stdout.decode('utf-8')
return len(re.findall(r'worker_pid', active_process_txt))

Be careful to change my_proj with your_proj

Gennagennaro answered 27/8, 2019 at 19:38 Comment(1)
This is not an answer to the question. This gives list of active tasks (tasks that are currently running). The question is about how to list tasks that are waiting in the queue.Guilt
M
0

To get the number of tasks on a queue you can use the flower library, here is a simplified example:

import asyncio
from flower.utils.broker import Broker
from django.conf import settings

def get_queue_length(queue):
    broker = Broker(settings.CELERY_BROKER_URL)
    queues_result = broker.queues([queue])
    res = asyncio.run(queues_result) or [{ "messages": 0 }]
    length = res[0].get('messages', 0)
Misrepresent answered 18/8, 2022 at 13:33 Comment(0)
B
0

Here it works for me without remove messages in queue

def get_broker_tasks() -> []:
    conn = <CELERY_APP_INSTANCE>.connection()

    try:
        simple_queue = conn.SimpleQueue(queue_name)
        queue_size = simple_queue.qsize()
        messages = []

        for i in range(queue_size):
            message = simple_queue.get(block=False)
            messages.append(message)

        return messages
    except:
        messages = []
        return messages
    finally:
        print("Close connection")
        conn.close()

Don't forget to swap out CELERY_APP_INSTANCE with your own.

@Owen: Hope my solution meet your expectations.

Before answered 8/8, 2023 at 3:51 Comment(0)
P
0
def get_queue_length(total_tasks: int, queue_name: str, node_name: str):
    queue_size = 0
    inspector = app.control.inspect()
    stats = inspector.stats()
    if stats is not None:
        if f"celery@{node_name}" in stats.keys():
            total = stats[f"celery@{node_name}"]["total"]
            if queue_name in total.keys():
                active_tasks = total[queue_name] 
                if int(total_tasks) > int(active_tasks):
                    queue_size = total_tasks - active_tasks
    return queue_size

This leverages celery's control and inspect commands but also keeps an eye on the tasks that have been submitted.

This alone doesn't really work unless you have some sort of loop that is enqueueing items, like the following:

total_tasks = 0
max_queue_length = 100 # choose your number
queue = "celery_queue"
full_queue_name = "YourCeleryApp.your_celery_queue_name"
for item in list_of_tasks
    total_tasks+=1
    queue_length = get_queue_length(total_tasks=total_tasks, queue_name=full_queue_name  node_name=node_name)
    while int(queue_length) >= max_queue_length:
        time.sleep(10)
        queue_length = get_queue_length(total_tasks=total_tasks, queue_name=full_queue_name , node_name=node_name)
    your_celery_task.apply_async(kwargs={},queue=queue)

With this what's happening is the following:

  1. Keep track of how many items have been submitted
  2. The above code will get the total which is the number of tasks that have been processed by a specific worker in a particular queue.
  3. We check whether the number of total tasks submitted is greater than our active_tasks or the tasks that have been processed by celery.

What this means is that if there are 50 tasks submitted and 30 have been processed, then there are 50-30 = 20 tasks in the queue

Pyriform answered 6/9, 2023 at 4:53 Comment(0)
S
0

I found a usecase from the Flower codebase to get the broker queue length. It's fast as broker access.

app = Celery("tasks")

from flower.utils.broker import Broker
broker = Broker(
    app.connection(connect_timeout=1.0).as_uri(include_password=True),
    broker_options=app.conf.broker_transport_options,
    broker_use_ssl=app.conf.broker_use_ssl,
)

async def queue_length():
    queues = await broker.queues(["celery"])
    return queues[0].get("messages")
Salmon answered 25/9, 2023 at 21:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.