How to tell Condor to dispatch jobs only to machines on the cluster, that have "numpy" installed on them?
Asked Answered
G

2

12

I just figured out how to send jobs to be processed on machines on the cluster by using Condor. Since we have a lot of machines and not each of those machines are configured the same, I was wondering:

Is it possible to tell condor only to dispatch my jobs (python scripts) to machines, that have numpy installed on them since my script depends on this package?

Gratification answered 25/3, 2012 at 22:45 Comment(0)
A
8

Like any other machine attribute, you just need to advertise it in the machine classad, and then have your jobs require it.

To advertise it in the machine classad, you can either hard-code it into each machine's condor config file by adding something like this:

has_numpy = True
STARTD_EXPRS = $(STARTD_EXPRS) HAS_NUMPY

... or better yet, you can tell Condor to dynamically discover it at runtime with a script and advertise the result via a startd classad hook. To do that, install a simple has_numpy script on each machine like so:

#!/usr/bin/env python
try:
   import numpy
except ImportError:
   print "has_numpy = False"
else:
   print "has_numpy = True"

... and then tell Condor to run it every five minutes and stick the results in the startd classad, by adding the following to the machine's condor config file:

HASNUMPY = /usr/libexec/condor/has_numpy
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) HASNUMPY
STARTD_CRON_HASNUMPY_EXECUTABLE = $(HASNUMPY)
STARTD_CRON_HASNUMPY_PERIOD = 300

...and then ta-da (after a reconfig) your machines will dynamically detect and report whether numpy is installed and available to python scripts.

Then you just need to add a corresponding requirement to your job submit files, like so:

Requirements = (has_numpy == True)

...and your job will only run on machines where numpy is installed.

Appreciative answered 30/3, 2012 at 2:13 Comment(0)
C
-2

Do you need to? According to the condor manual:

Condor does not require an account (login) on machines where it runs a job. Condor can do this because of its remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are transmitted over the network to be performed on the machine where the job was submitted.

To me this implies that if the machine submitting the job has numpy installed, it should work.

Cashew answered 26/3, 2012 at 20:47 Comment(3)
Thanks for your reply. Unfortunatly my case proves other wise. The jobs are submitted to different machines. Some submissions result in an ImportError of numpy. I double checked those machines and they had no numpy on them installed. So condor does not seem to prevent submitting a job to a machine which does not fulfil the requirements of the job - in my case a numpy installation. Maybe it is the case in our installation of condor, though. I myself didn't set up the system and it is the first time I work with condor. :-)Gratification
Time to contact the condor developers. Either their manual is not correct, or Python is treated differently like Java is.Cashew
This quote from the manual is out of context. The Condor remote system call feature is available only to "standard universe" jobs. Although he doesn't say so, the OP must use the "vanilla" universe because the standard universe imposes constraints that preclude interpreters like Python from running under it.Appreciative

© 2022 - 2024 — McMap. All rights reserved.