Why is it not recommended to run squeue
in a loop to avoid overloading Slurm, but no such limitations are mentioned for the bjobs
tool from LSF or qstat
from SGE ?
The man page for squeue
states:
PERFORMANCE
Executing squeue sends a remote procedure call to slurmctld. If enough calls from squeue or other Slurm client commands that send remote procedure calls to the slurmctld daemon come in at once, it can result in a degradation of performance of the slurmctld daemon, possibly resulting in a denial of service.
Do not run squeue or other Slurm client commands that send remote procedure calls to slurmctld from loops in shell scripts or other programs. Ensure that programs limit calls to squeue to the minimum necessary for the information you are trying to gather.
which to my understanding disapproves the use of e.g. watch squeue
. Such a warning is commonly found in site-specific documentation, e.g. here:
Although squeue is a convenient command to query the status of jobs and queues, please be careful not to issue the command excessively, for example, invoking the query for the status of a job every five seconds or so using a script after a job is submitted.
In comparison, I could find no such warning for similar tools on other engines e.g. qstat
or bjobs
.
I see people using all of these tools in a repetitive fashion without distinction, e.g. here for squeue, here for bjobs.
The quote above from Slurm documentation mention a RPC, is it a way of doing different from other engines ? Is there an architecture difference between Slurm and other grid engines that makes querying the status of all jobs more costly ?
bjobs -uall -a | grep blah
, then there will be a service degradation. – Unfaithfulsqueue
itself has a-i
or--iterate
option which will re-run it every N seconds, down to every second. It looks like it can request updates since its last query, which presumably helps a bit, but as far as I can tell it's still sending a new RPC request every time. There's no warning when I use this. – Conchoid