I have IDs of completed jobs. How do I check its detailed information, such as execution time, allocated nodes, etc? I remember SGE has a command for it (qacct?). But I could not find it for PBS or Torque. Thanks.
Since job accounting requires root access to view completed jobs, or that the cluster admins have installed pbstools (both out of the control of a user), I've found that the easiest thing to do is to place a
tracejob $PBS_JOBID
on the last line of the submission script. If the scheduler is MAUI, then checkjob -vv $PBS_JOBID
is another alternative. These commands could be redirected to a separate outfile:
tracejob $PBS_JOBID > $PBS_O_WORKDIR/$PBS_JOBID.tracejob
Should also be possible to have this run as a user epilog script to make it more reusable from job to job.
I was looking at this thread searching how to do this in my HPC running PBSPro 19.2.3 and as of PBSPro 18 the solution is similar to John Damm Sørensen's reply, but the -w
flag is used instead of -1
to display output of each field in a single line and you need to add -x
flag to see the details of finished jobs as well, so you don't need to run it within the job script. (p.203, section 2.59.2.2 of the Reference Guide)
qstat -fxw $PBS_JOBID
You can then grep
out of it the requested information, such as resources used, Exit status, etc:
qstat -fxw $PBS_JOBID | grep -E "resources_used|Exit_status|array_index"
Right now the only way to get this in TORQUE is to look at the accounting logs. You can grep for the job id and view the accounting records for the job, which look like this:
04/30/2014 15:20:18;Q;5000.bob;queue=batch
04/30/2014 15:33:00;S;5000.bob;user=dbeer group=dbeer jobname=STDIN queue=batch ctime=1398892818 qtime=1398892818 etime=1398892818 start=1398893580 owner=dbeer@bob exec_host=bob/0
04/30/2014 15:36:20;E;5000.bob;user=dbeer group=dbeer jobname=STDIN queue=batch ctime=1398892818 qtime=1398892818 etime=1398892818 start=1398893580 owner=dbeer@bob exec_host=bob/0 session=22933 end=1398893780 Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=2580kb resources_used.vmem=37072kb resources_used.walltime=00:03:20
Unfortunately, to do this directly you have to have root access. To get around this, there are tools such as pbsacct that help better browse this. pbsacct is part of the pbstools package, which is where that link takes you.
For Torque, you can check at least part of the information you seek using the "tracejob" command.
Official documentation:
One thing you should notice is that this tool is a convenience that parses the logs. By default it will only check the last day. Be sure to read the doc for the "-n" option.
tracejob
command, showing samples of typical output. –
Siegfried On a Torque based system. I find that the best way to get stats from a job is to add this to the end of the submitted job script. The output will be added to the STDOUT file.
qstat -f -1 $PBS_JOBID
© 2022 - 2024 — McMap. All rights reserved.