How to submit a job to any [subset] of nodes from nodelist in SLURM?
Asked Answered
M

3

40

I have a couple of thousand jobs to run on a SLURM cluster with 16 nodes. These jobs should run only on a subset of the available nodes of size 7. Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded. Therefore, multiple jobs should run at the same time on a single node. None of the tasks should spawn over multiple nodes.

Currently I submit each of the jobs as follow:

sbatch --nodelist=myCluster[10-16] myScript.sh

However this parameter makes slurm to wait till the submitted job terminates, and hence leaves 3 nodes completely unused and, depending on the task (multi- or single-threaded), also the currently active node might be under low load in terms of CPU capability.

What are the best parameters of sbatch that force slurm to run multiple jobs at the same time on the specified nodes?

Marj answered 6/10, 2014 at 12:57 Comment(0)
B
59

You can work the other way around; rather than specifying which nodes to use, with the effect that each job is allocated all the 7 nodes, specify which nodes not to use:

sbatch --exclude=myCluster[01-09] myScript.sh

and Slurm will never allocate more than 7 nodes to your jobs. Make sure though that the cluster configuration allows node sharing, and that your myScript.sh contains #SBATCH --ntasks=1 --cpu-per-task=n with n the number of threads of each job.

Update: since version 23.02, the --nodelist may contain more nodes than specified by --nodes. From the changelog:

-- Allow for --nodelist to contain more nodes than required by --nodes.

Bibliogony answered 7/10, 2014 at 22:30 Comment(7)
This assuming you are not the administrator. Otherwise limits and associations are the way to go.Bibliogony
With 'associations' do you mean 'reservations' in SLURM vocabulary?Marj
No I mean associations which is term Slurm uses in the context of accounts, quality of services, partitions, etc. to set limits.Bibliogony
I am having trouble with the syntax =myCluster[01-09] :( which are the distinct node names in this case?Jabon
--exclude=myCluster[01-09] is equivalent to --exclude=myCluster01,myCluster02,myCluster03,myCluster04,myCluster05,myCluster07,myCluster08,myCluster09,myCluster10,.Bibliogony
@Bibliogony is it possible to get the name of the nodes as in PBS. i.e. do the equivalent cat $PBS_NODEFILE > machinefileCris
The variable SLURM_JOB_NODELIST holds the list of nodes. (not the path to a file that contains the list of nodes)Bibliogony
M
3

Some of the tasks are parallelized, hence use all the CPU power of a single node while others are single threaded.

I understand that you want the single-threaded jobs to share a node, whereas the parallel ones should be assigned a whole node exclusively?

multiple jobs should run at the same time on a single node.

As far as my understanding of SLURM goes, this implies that you must define CPU cores as consumable resources (i.e., SelectType=select/cons_res and SelectTypeParameters=CR_Core in slurm.conf)

Then, to constrain parallel jobs to get a whole node you can either use --exclusive option (but note that partition configuration takes precedence: you can't have shared nodes if the partition is configured for exclusive access), or use -N 1 --tasks-per-node="number_of_cores_in_a_node" (e.g., -N 1 --ntasks-per-node=8).

Note that the latter will only work if all nodes have the same number of cores.

None of the tasks should spawn over multiple nodes.

This should be guaranteed by -N 1.

Melton answered 6/10, 2014 at 20:14 Comment(2)
Crucial is that all my jobs use in total not more than 7 nodes. Each node of our cluster has 20 cores and 2 threads per core. If I understand you correctly you propose to submit parallel jobs with sbatch --nodelist=myCluster[10-16] --ntasks-per-node=40 -N 1 myScript.sh. Why not --ntasks-per-node=1, to make sure that not more than one job runs at the same time on a single node? What about the single threaded jobs?Marj
@Marj If you want to confine a set of jobs to use a maximum of 7 nodes in total, than a partition or a QoS setting would be the way to go.Melton
M
0

Actually I think the way to go is setting up a 'reservation' first. According to this presentation http://slurm.schedmd.com/slurm_ug_2011/Advanced_Usage_Tutorial.pdf (last slide).

Scenario: Reserve ten nodes in the default SLURM partition starting at noon and with a duration of 60 minutes occurring daily. The reservation will be available only to users alan and brenda.

scontrol create reservation user=alan,brenda starttime=noon duration=60 flags=daily nodecnt=10
Reservation created: alan_6

scontrol show res
ReservationName=alan_6 StartTime=2009-02-05T12:00:00
    EndTime=2009-02-05T13:00:00 Duration=60 Nodes=sun[000-003,007,010-013,017] NodeCnt=10 Features=(null) PartitionName=pdebug Flags=DAILY Licenses=(null)
    Users=alan,brenda Accounts=(null)

# submit job with:
sbatch --reservation=alan_6 myScript.sh

Unfortunately I couldn't test this procedure, probaly due to a lack of privileges.

Marj answered 7/10, 2014 at 12:26 Comment(3)
A reservation will prevent any other user from running on the same set of nodes, that's why an admin is needed to create it. Is this what you really want? Reserve nodes for your exclusive access?Melton
Well that's what we agreed on among the (few) users. If we can set a max duration, why not? Or is this approach a complete anti-pattern for cluster usage?Marj
Is it possible give regular users permission to set up reservations?Marj

© 2022 - 2024 — McMap. All rights reserved.