Python library for job scheduling, ssh
Asked Answered
P

6

7

I'd like to find a user-space tool (preferably in Python - barring that, in anything I could easily modify if it doesn't already do what I need it to) to replace a short script I've been using that does the two things below:

  • polls less than 100 computers (Fedora 13, it so happens) for load, available memory, and if it looks like someone is using them
  • selects good hosts for jobs, runs these jobs over ssh. These jobs are the execution of arbitrary command line programs which read and write to a shared filesystem - typically image processing scripts or similar - cpu, sometimes memory intensive tasks.

For example, using my current script, I can in a python prompt

>>> import hosts
>>> hosts.run_commands(['users']*5)

or from the command line

% hosts.py "users" "users" "users" "users" "users"

to run the command users 5 times (after finding 5 computers on which the command could be run by checking the cpu load and available memory on at least 5 computers from a config file). There should be no job server other than the script I just ran, and no worker daemons or processes on the computers that will run these commands.

I'd additionally like to be able to track the jobs, run jobs again on failure, etc., but these are extra features (very standard in a real job scheduler) that I don't actually need.

I've found good ssh libraries for Python, things like classh and PuSSH, which don't have the (very simple) load balancing features I'd like. On the other side of what I want is Condor or Slurm, as suggested by crispamares before I clarified I want something lighter. Those would be doing things the proper way, but from reading about them, they sounds like spinning them up in user space only when I need them would be annoying to impossible. This isn't a dedicated cluster, and I don't have root access on these hosts.

I'm currently planning to use a wrapper around classh with some basic polling of computers whenever I need to know how busy they are if I can't find something else.

Phosphocreatine answered 12/4, 2011 at 14:1 Comment(7)
What kind of jobs? fabric (ssh wrapper, no job scheduling), jenkins (CI tool: repeatable tasks, zero-setup, simple load balancing), disco (MapReduce, erlang+python, only python jobs?), hadoop (big, requires root?), PBS (Torque -- traditional workload management system).Baiss
Thanks J.F., edited question to say that jobs are running command line programs that process images, reading and writing to a shared filesystem.Phosphocreatine
Fabric could be reasonable for this use, but not until parallel execution of code is a feature.Phosphocreatine
@Thomas: You could try goosemo's fork for parallel execution github.com/goosemo/fabric (it won't be integrated at least until fabric 1.2).Baiss
I'll have something working with classh in a few hours for what I need it for tomorrow morning. After that, I won't touch it for a while, and will hang in there until fabric 1.2. (or even help it along by minuscule amounts)Phosphocreatine
This question on serverfault lists the sorts of things I was thinking of, but they mostly seem to be for executing ONE command on many hosts, whereas I want to execute different commands on each host serverfault.com/questions/13322/…Phosphocreatine
Fabric has parallel execution now! readthedocs.org/docs/fabric/en/latest/usage/parallel.htmlPhosphocreatine
P
3

There is fabric, I am surprised no one has not mentioned it.

Possie answered 27/11, 2012 at 15:24 Comment(1)
Fabric has parallel execution now! readthedocs.org/docs/fabric/en/latest/usage/parallel.htmlPhosphocreatine
A
2

Slurm is a powerful job scheduler that can be programmable in Python using PySlurm.

I don't know if it is harder than Condor to deploy. Also I don't know if it fits all your needs, but just in case, I write it down.

Albritton answered 12/4, 2011 at 14:50 Comment(1)
Slurm looks like it would do the job, but deploying it in user space looks nigh impossible. I'm going to scale back my requirements a wee bit in the question. (so note this answer was more appropriate before editing the question)Phosphocreatine
S
1

You could modify buildbot and twisted? This seems like a good way to go.

Spermatium answered 12/4, 2011 at 14:30 Comment(0)
A
1

Have a look at func. I haven't used it beyond the "Hello, world" level, but I think it fits the bill perfectly for you.

Amandaamandi answered 22/12, 2011 at 14:1 Comment(0)
R
1

I might be a little late: i like to recommend a look at python saga here.

Remuneration answered 19/3, 2014 at 11:10 Comment(0)
P
0

I might be late for this question but I encountered the same issue recently and I am looking for a C/C+ library where I can do job scheduling and server load balancing for processing of image files over a cluster of servers. I will call the library from a GUI and monitor the status of the jobs.

I installed slurm and tried the commands, however utilizing it as a tool and possibly as a library seems rather difficult. Other options seem to provide job scheduling but no load balancing based on cpu utilization. I would appreciate any suggestions.

Best Regards

Pisciculture answered 1/9, 2015 at 8:43 Comment(1)
Hi Mustaf - you'll probably get more helpful responses if you ask a new question. I'd certainly vote for such a question, I'm curious too!Phosphocreatine

© 2022 - 2024 — McMap. All rights reserved.