Schedule reminder for recurring event
Asked Answered
A

1

7

I'm working with a web application that allows users to create events (one-off or recurring) on a calendar, and shortly before an event starts the system will notify its participants. I'm having trouble with designing the flow for such notification, particularly for recurring events.

Things to consider:

  1. The architecture of the web application made it so that there are many databases of the same structure, each keeps its own set of users and events. Thus any query against one database needs to be made against several thousand others.
  2. A recurring event may have excluded dates (similar to RRULE and EXDATE combination).

  3. Users can update event's time / recurring rule.

  4. The application is written in Python and already using Celery 3.1 with Redis broker. Solutions work with this setting would be nice, though anything will do. From what I have found, it is hard to add periodic task dynamically with Celery currently.

A solution I'm attempting:

  • A periodic task runs once a day, scanning every database and add tasks to do notification at appropriate time for each event that has a recurrence that day.

  • Each task generated as above has its id saved temporarily in Redis. In case users change event time for that day after its notification task is scheduled, the task will be revoked and replaced with new one.

Sample code for above solution:

  • In tasks.py, all the tasks to run:

    from celery.task import task as celery_task
    from celery.result import AsyncResult
    from datetime import datetime
    
    # ...
    
    @celery_task
    def create_notify_task():
        for account in system.query(Account):
            db_session = account.get_session()    # get sql alchemy session
            for event in db_session.query(Event):
                schedule_notify_event(account, partial_event)
    
    
    @celery_task(name='notify_event_users')
    def notify_event_users(account_id, event_id):
        # do notification for every event participant
        pass
    
    def schedule_notify_event(account, event):
        partial_event = event.get_partial_on(datetime.today())
        if partial_event:
            result = notify_event_users.apply_async(
                    args = (account.id, event.id),
                    eta = partial_event.start)
            replace_task_id(account.id, event.id, result.id)
        else:
            replace_task_id(account.id, event.id, None)
    
    def replace_task_id(account_id, event_id, result_id):
        key = '{}:event'.format(account_id)
        client = redis.get_client()
        old_result_id = client.hget(key, event_id)
        if old_result_id:
            AsyncResult(old_result_id).revoke()
        client.hset(key, event_id, result_id)
    
  • In event.py:

    # when a user change event's time
    def update_event(event, data):
        # ...
        # update event
        # ...
        schedule_notify_event(account, event)
    
  • Celery setup file:

    from celery.schedules import crontab
    
    CELERYBEAT_SCHEDULE = {
        'create-notify-every-day': {
            'task': 'tasks.create_notify_task',
            'schedule': crontab(minute=0, hour=0),
            'args': (,)
        },
    }
    

Some downsides of the above are:

  • The daily task can take a long time to run. Events in databases processed last have to wait and might be missed. Scheduling that task earlier (e.g. 2 hours before next day) may alleviate this, however first run setup (or after a server restart) is a little awkward.

  • Care must be taken so that notify task doesn't get scheduled twice for the same event (e.g. because create_notify_task is run more than once a day...).

Is there a more sensible approach to this?

Related questions:

Affrica answered 29/3, 2016 at 2:38 Comment(4)
You need to post whatever code you have already tried writing. This is not a contractor website where we do the work for you. You have to make the first attempt and we will help you with problems as you go along.Whipperin
I only ask for an approach / workflow, not any code; and I have already presented a crude solution I thought of. Anyway, I have added some code snippets to give you a clearer picture.Affrica
I am not a big Python person so hopefully someone else who is an expert at Python can jump in here and respond if they do see a problem. But from looking at what you have presented I really don't see anything wrong with your logic on how to approach this.Whipperin
Well, at least it works I think. But there are some downsides as I have outlined, which make it feels brittle. The approach needs not to be Python-specific actually, though I would shy away from something akin to building my own task scheduler.Affrica
A
5

It's been a long time without any answer, and I forgot about this question. Anyway, at the time I went with the following solution. I outline it here in case someone is interested.

  • When an event is created, a task is scheduled to run shortly before its next occurrence (i.e. next notification time). The scheduled time is calculated with all recurring and exception rules applied, so it's just a simple scheduled one-time task for celery.
  • When the task runs, it do the notification job, and schedule a new task at the next notification time (again, with all recurring and exception rules considered). If there is no next event occurrence, no new task is scheduled.
  • The task's id is saved together with the event in database. If event's time is changed, the task is cancelled and a new task is scheduled at new next notification time. When the task runs and schedules a new task, the new task's id is saved in database.

Some pros and cons that I could think of:

  • Pros:
    • No need for complicated recurring rule in celery, since tasks are only schedule for a single run.
    • Each task is fairly small and quick, as it only has to care about a single event notification.
  • Cons:
    • At any time, there are a lot of celery timed tasks waiting for execution, probably on the order of hundreds of thousands. I'm not sure how this affects celery's performance, so it may or may not be an actual con. So far the system appears to run just fine.
Affrica answered 9/10, 2017 at 8:46 Comment(4)
Still using Redis as the backend? We have a system using an equivalent setup, but it doesn't work well if the jobs are scheduled a long time into the future. Jobs may be scheduled multiple times if they are scheduled beyond the VISIBILITY_TIMEOUT setting. Have you encountered this problem?Sebbie
Yes, still Redis backend. I had moved to another project before this part came to production, so I didn't hear more about it. It seems to be a known and long standing issue with Redis transport (e.g. here...), even documentedAffrica
Exactly, so I wondered how you solved it if it was an issue for you.Sebbie
Well, I only came to know of this issue after reading your comment. During my time in the project no such issue was ever noticed. As for a solution, people has talked about some workarounds, like using another broker (e.g. AMQP) or increasing visibility_timeout. Do they work for your case? Another crude workaround I think of just now is to schedule the task only a short time later (less than visibility_timeout), then the task must keep rescheduling itself a short time later until it reaches the actual ETA intended, at which point it can perform the work and finish (no more rescheduling).Affrica

© 2022 - 2024 — McMap. All rights reserved.