python long running daemon job processor
Asked Answered
L

5

2

I want to write a long running process (linux daemon) that serves two purposes:

  • responds to REST web requests
  • executes jobs which can be scheduled

I originally had it working as a simple program that would run through runs and do the updates which I then cron’d, but now I have the added REST requirement, and would also like to change the frequency of some jobs, but not others (let’s say all jobs have different frequencies).

I have 0 experience writing long running processes, especially ones that do things on their own, rather than responding to requests.

My basic plan is to run the REST part in a separate thread/process, and figured I’d run the jobs part separately.

I’m wondering if there exists any patterns, specifically python, (I’ve looked and haven’t really found any examples of what I want to do) or if anyone has any suggestions on where to begin with transitioning my project to meet these new requirements. I’ve seen a few projects that touch on scheduling, but I’m really looking for real world user experience / suggestions here. What works / doesn’t work for you?

Lavina answered 10/7, 2009 at 5:16 Comment(3)
Why writing this instead of using an existing web server + cron?Chemo
for the rest stuff, it might be an option, but for the jobs, we cant have a cron entry for 1000 different jobs, nor do we want to try and store the last run time and check against that every time. from a maintainability standpoint, we're aiming for a daemon.Lavina
in reality, some of the rest stuff will call off job execution, add jobs to new schedules, ect. some of these jobs will take 5-10 minutes to run.Lavina
G
3
  • If the REST server and the scheduled jobs have nothing in common, do two separate implementations, the REST server and the jobs stuff, and run them as separate processes.

  • As mentioned previously, look into existing schedulers for the jobs stuff. I don't know if Twisted would be an alternative, but you might want to check this platform.

  • If, OTOH, the REST interface invokes the same functionality as the scheduled jobs do, you should try to look at them as two interfaces to the same functionality, e.g. like this:

    • Write the actual jobs as programs the REST server can fork and run.
    • Have a separate scheduler that handles the timing of the jobs.
    • If a job is due to run, let the scheduler issue a corresponding REST request to the local server. This way the scheduler only handles job descriptions, but has no own knowledge how they are implemented.
  • It's a common trait for long-running, high-availability processes to have an additional "supervisor" process that just checks the necessary demons are up and running, and restarts them as necessary.

Garbers answered 10/7, 2009 at 5:16 Comment(1)
torn as to what to mark correct here. some really good suggestionsLavina
B
1

Here's what we did.

  1. Wrote a simple, pure-wsgi web application to respond to REST requests.

    • Start jobs

    • Report status of jobs

  2. Extended the built-in wsgiref server to use the select module to check for incoming requests.

    • Activity on the socket is ordinary REST request, we let the wsgiref handle this. It will -- eventually -- call our WSGI applications to respond to status and submit requests.

    • Timeout means that we have to do two things:

      • Check all children that are running to see if they're done. Update their status, etc.

      • Check a crontab-like schedule to see if there's any scheduled work to do. This is a SQLite database that this server maintains.

Bianchi answered 10/7, 2009 at 5:16 Comment(0)
E
1

One option is to simply choose a lightweight WSGI server from this list:

and let it do the work of a long-running process that serves requests. (I would recommend Spawning.) Your code can concentrate on the REST API and handling requests through the well defined WSGI interface, and scheduling jobs.

There are at least a couple of scheduling libraries you could use, but I don't know much about them:

Embryologist answered 10/7, 2009 at 5:16 Comment(1)
scheduler-py looks great, going to dig into it's guts in the morning.Lavina
A
0

The usual design pattern for a scheduler would be:

  • Maintain a list of scheduled jobs, sorted by next-run-time (as Date-Time value);
  • When woken up, compare the first job in the list with the current time. If it's due or overdue, remove it from the list and run it. Continue working your way through the list this way until the first job is not due yet, then go to sleep for (next_job_due_date - current_time);
  • When a job finishes running, re-schedule it if appropriate;
  • After adding a job to the schedule, wake up the scheduler process.

Tweak as appropriate for your situation (eg. sometimes you might want to re-schedule jobs to run again at the point that they start running rather than finish).

Adventitia answered 10/7, 2009 at 5:16 Comment(1)
this pretty much confirmed what i was thinking. I'd mark it correct but really im waiting for some to upvote some of these before picking a 'correct' answer. ars's suggestion of scheduler-py looks great, im going to give it a try tomorrow.Lavina
P
0

I usually use cron for scheduling. As for REST you can use one of the many, many web frameworks out there. But just running SimpleHTTPServer should be enough.

You can schedule the REST service startup with cron @reboot

@reboot (cd /path/to/my/app && nohup python myserver.py&)
Pixie answered 10/7, 2009 at 5:16 Comment(1)
TBH the rest implementation is not something im too worried about, as i've said i think i've got the long running process responding to request things down, its the do X every Y hours/mins/seconds for 10000 jobs thats got me stumped.Lavina

© 2022 - 2024 — McMap. All rights reserved.