How to use python to schedule tasks in a Django application
Asked Answered
G

2

9

I'm new to Django and web frameworks in general. I have an app that is all set up and works perfectly fine on my localhost.

The program uses Twitter's API to gather a bunch of tweets and displays them to the user. The only problem is I need my python program that gets the tweets to be run in the background every-so-often.

This is where using the schedule module would make sense, but once I start the local server it never runs the schedule functions. I tried reading up on cronjobs and just can't seem to get it to work. How can I get Django to run a specific python file periodically?

Genitor answered 22/6, 2020 at 23:58 Comment(0)
L
28

I've encountered a similar situation and have had a lot of success with django-apscheduler. It is all self-contained - it runs with the Django server and jobs are tracked in the Django database, so you don't have to configure any external cron jobs or anything to call a script.

Below is a basic way to get up and running quickly, but the links at the end of this post have far more documentation and details as well as more advanced options.

Install with pip install django-apscheduler then add it to your INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    'django_apscheduler',
    ...
]

Once installed, make sure to run makemigrations and migrate on the database.

Create a scheduler python package (a folder in your app directory named scheduler with a blank __init__.py in it). Then, in there, create a file named scheduler.py, which should look something like this:

from apscheduler.schedulers.background import BackgroundScheduler
from django_apscheduler.jobstores import DjangoJobStore, register_events
from django.utils import timezone
from django_apscheduler.models import DjangoJobExecution
import sys

# This is the function you want to schedule - add as many as you want and then register them in the start() function below
def deactivate_expired_accounts():
    today = timezone.now()
    ...
    # get accounts, expire them, etc.
    ...


def start():
    scheduler = BackgroundScheduler()
    scheduler.add_jobstore(DjangoJobStore(), "default")
    # run this job every 24 hours
    scheduler.add_job(deactivate_expired_accounts, 'interval', hours=24, name='clean_accounts', jobstore='default')
    register_events(scheduler)
    scheduler.start()
    print("Scheduler started...", file=sys.stdout)

In your apps.py file (create it if it doesn't exist):

from django.apps import AppConfig

    class AppNameConfig(AppConfig):
        name = 'your_app_name'
        def ready(self):
            from scheduler import scheduler
            scheduler.start()

A word of caution: when using this with DEBUG = True in your settings.py file, run the development server with the --noreload flag set (i.e. python manage.py runserver localhost:8000 --noreload), otherwise the scheduled tasks will start and run twice.

Also, django-apscheduler does not allow you to pass any parameters to the functions that are scheduled to be run. It is a limitation, but I've never had a problem with it. You can load them from some external source, like the Django database, if you really need to.

You can use all the standard Django libraries, packages and functions inside the apscheduler tasks (functions). For example, to query models, call external APIs, parse responses/data, etc. etc. It's seamlessly integrated.

Some additional links:

Ledesma answered 23/6, 2020 at 1:0 Comment(11)
thanks for the detailed answer really appreciate itGenitor
I recommend adding an id="some_id" to the add_job call to prevent duplicating the job every startupKnuckleduster
In addition to Michael's caution: You can also just check the RUN_MAIN env var in the ready() method, on the second iteration it equals True. This clean and simple workaround allows you to keep dev server reloading.Mccloud
Also don't forget to set the default app configuration in application's __init__.py: default_app_config = 'myApp.apps.myAppConfig'Roughspoken
register_events has been deprecated and will be removed in a future release. Calling this method is no longer necessary as the DjangoJobStore will automatically register for events that it cares about when the scheduler is started.Jargonize
Can this module be used if you are using an external dB(mongo) through an API for read and write? Basically I don't have models.Compotation
actually they do allow parameters using args = [paramter_list .. ,]Anta
@EugeneZabolotny can you explain further the RUN_MAIN approach? Where is RUN_MAIN coming from, is there any documentation? Thanks.Pantaloon
@Pantaloon good question! It took some time for me someday to understand what's going on there. The answer is here django/utils/autoreload.py. Initially there is no RUN_MAIN environment variable, just DJANGO_AUTORELOAD_ENV = 'RUN_MAIN' in the code. Hence in run_with_reloader the program takes the else code block, and in restart_with_reloader, the environment variable is set new_environ = {**os.environ, DJANGO_AUTORELOAD_ENV: 'true'}.Mccloud
I can't makemigrations: django.db.utils.ProgrammingError: relation "django_apscheduler_djangojob" does not exist LINE 1: INSERT INTO "django_apscheduler_djangojob" ("id", "next_run_...Harleyharli
The solution worked. However when running it, I encountered exception about job already existing. I didn't see it mention in both of the above answer and the original article about it so posting it here for those who need. I added the additional check before adding the job if not DjangoJob.objects.filter(id=job_id).exists():, given that you had job_id when adding the job for ex. scheduler.add_job(refresh_something, 'interval', seconds=60, name="Refresh Something", jobstore='default', id=job_id).Santossantosdumont
N
4

Another library you can use is django-q

Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing. 1

Like django-appscheduler it can run and track jobs using the database Django is attached to. Or, it can use full-blown brokers like Reddis.

The only problem is I need my python program that gets the tweets to be run in the background every-so-often.

That sounds like a scheduler. (Django-q also has a tasks feature, that can be triggered by events rather than being run on a schedule. The scheduler just sits on top of the task feature, and triggers tasks at a defined schedule.)

There's three parts to this with django-q:

  1. Install Django-q and configure it;
  2. Define a task function (or set of functions) that you want to fetch the tweets;
  3. Define a schedule that runs the tasks;
  4. Run the django-q cluster that'll process the schedule and tasks.

Install django-q

pip install django-q

Configure it as an installed app in Django settings.py (add it to the install apps list):

INSTALLED_APPS = [
    ...
    'django_q',
    ...
]

Then it needs it's own configuration settings.py (this is a configuration to use the database as the broker rather than reddis or something external to Django.)

# Settings for Django-Q
# https://mattsegal.dev/simple-scheduled-tasks.html

Q_CLUSTER = {
    'orm': 'default',  # should use django's ORM and database as a broker.
    'workers': 4,
    'timeout': 30,
    'retry': 60,
    'queue_limit': 50,
    'bulk': 10,
}

You'll then need to run migrations on the database to create the tables django-q uses:

python manage.py migrate

(This will create a bunch of schedule and task related tables in the database. They can be viewed and manipulated through the Django admin panel.)

Define a task function

Then create a new file for the tasks you want to run:

# app/tasks.py
def fetch_tweets():
    pass  # do whatever logic you want here

Define a task schedule

We need to add into the database the schedule to run the tasks.

python manage.py shell
from django_q.models import Schedule
Schedule.objects.create(
    func='app.tasks.fetch_tweets',  # module and func to run
    minutes=5,  # run every 5 minutes
    repeats=-1  # keep repeating, repeat forever
)

You don't have to do this through the shell. You can do this in a module of python code, etc. But you probably only need to create the schedule once.

Run the cluster

Once that's all done, you need to run the cluster that will process the schedule. Otherwise, without running the cluster, the schedule and tasks will never be processed. The call to qcluster is a blocking call. So normally you want to run it in a separate window or process from the Django server process.

python manage.py qcluster

When it runs you'll see output like:

09:33:00 [Q] INFO Q Cluster fruit-november-wisconsin-hawaii starting.
09:33:00 [Q] INFO Process-1:1 ready for work at 11
09:33:00 [Q] INFO Process-1:2 ready for work at 12
09:33:00 [Q] INFO Process-1:3 ready for work at 13
09:33:00 [Q] INFO Process-1:4 ready for work at 14
09:33:00 [Q] INFO Process-1:5 monitoring at 15
09:33:00 [Q] INFO Process-1 guarding cluster fruit-november-wisconsin-hawaii
09:33:00 [Q] INFO Q Cluster fruit-november-wisconsin-hawaii running.

There's also some example documentation that's pretty useful if you want to see how to hook up tasks to reports or emails or signals etc.

Nicaea answered 7/1, 2022 at 10:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.