Measuring Celery task execution time
Asked Answered
F

2

11

I have converted a standalone batch job to use celery for dispatching the work to be done. I'm using RabbitMQ. Everything is running on a single machine and no other processes are using the RabbitMQ instance. My script just creates a bunch of tasks which are processed by workers.

Is there a simple way to measure the time from the start of my script until all tasks are finished? I know that this a bit complicated by design when using message queues. But I don't want to do it in production, just for testing and getting a performance estimation.

Foist answered 20/10, 2013 at 18:56 Comment(0)
L
7

You could use a chord by adding a fake task at the end that would be passed the time at which the tasks were sent, and that would return the difference between current time and the time passed when executed.

import celery
import datetime
from celery import chord

@celery.task
def dummy_task(res=None, start_time=None):
    print datetime.datetime.now() - start_time

def send_my_task():
    chord(my_task.s(), dummy_task.s(start_time=datetime.datetime.now()).delay()

send_my_task sends the task that you want to profile along with a dummy_task that would print how long it took (more or less). If you want more accurate numbers, I suggest passing the start_time directly to your tasks, and using the signals.

Lambrequin answered 20/10, 2013 at 19:6 Comment(4)
But dummy_task will be another task and can be executed on different worker or significant later, than original task.Introvert
@homm, yes, but the OP explicitly stated that there is a single worker node, and no other processes are using the RabbitMQ node, thus only tasks that we are measuring are calculated. The only delay comes from receiving the time measuring tasks for the last time, but the chord is on a 1-second periodic timer.Lambrequin
No other processes, but not "no other tasks", right? If there is no free worker processes, dummy_task will wait.Introvert
@homm, yes, but the OP said that no other process than his script uses the queue, and the OP wants to measure time from start of the script up to when all tasks have finished.Lambrequin
B
38

You could use celery signals, functions registered will be called before and after a task is executed, it is trivial to measure elapsed time:

from time import time
from celery.signals import task_prerun, task_postrun


d = {}

@task_prerun.connect
def task_prerun_handler(signal, sender, task_id, task, args, kwargs, **extras):
    d[task_id] = time()


@task_postrun.connect
def task_postrun_handler(signal, sender, task_id, task, args, kwargs, retval, state, **extras):
    try:
        cost = time() - d.pop(task_id)
    except KeyError:
        cost = -1
    print task.__name__, cost
Biota answered 30/7, 2015 at 18:40 Comment(1)
@vikas-prasad kwargs is for receiving "task keyword arguments", added **extras for celery 4 compatiability.Biota
L
7

You could use a chord by adding a fake task at the end that would be passed the time at which the tasks were sent, and that would return the difference between current time and the time passed when executed.

import celery
import datetime
from celery import chord

@celery.task
def dummy_task(res=None, start_time=None):
    print datetime.datetime.now() - start_time

def send_my_task():
    chord(my_task.s(), dummy_task.s(start_time=datetime.datetime.now()).delay()

send_my_task sends the task that you want to profile along with a dummy_task that would print how long it took (more or less). If you want more accurate numbers, I suggest passing the start_time directly to your tasks, and using the signals.

Lambrequin answered 20/10, 2013 at 19:6 Comment(4)
But dummy_task will be another task and can be executed on different worker or significant later, than original task.Introvert
@homm, yes, but the OP explicitly stated that there is a single worker node, and no other processes are using the RabbitMQ node, thus only tasks that we are measuring are calculated. The only delay comes from receiving the time measuring tasks for the last time, but the chord is on a 1-second periodic timer.Lambrequin
No other processes, but not "no other tasks", right? If there is no free worker processes, dummy_task will wait.Introvert
@homm, yes, but the OP said that no other process than his script uses the queue, and the OP wants to measure time from start of the script up to when all tasks have finished.Lambrequin

© 2022 - 2024 — McMap. All rights reserved.