Using PBS commands in HPC
Contents
PBS, or Portable Batch System is one of the popular job scheduler used in HPC (High Performance Computer) cluster system. It provides useful commands to help with job management. Here I just list some of them for note.
Some useful PBS commands:
- qsub
- showq
- qstat
- qdel
- qpeek
- pbsnode
- qalter
Those commands with some most used parameters:
qsub
qsub script
The script is used to put in some control in HPC jobs when submitted them. Like how much time/processors/memory you would like to assign to your job.
A basic script template looks like this
|
|
Some variables in script
Variable | Meaning |
---|---|
$PBS_HOME | The home dirrectory Where you ran qsub command |
$PBS_O_LOGNAME | Your UID |
$PBS_O_HOST | Your host or node name from where you ran qsub |
$PBS_O_QUEUE | The queue name from which you submitted your job |
$PBS_QUEUE | The queue name where your job is running |
$PBS_JOBID | The job ID |
$PBS_JOBNAME | The job name |
$PBS_ENVIRONMENT | The PBS enviromental variables |
$PBS_O_WORKDIR | The working directory from where you ran qsub |
qsub script
Also we could use qsub in interactive jobs: qsub -I
useful qsub script snipplet
PBS -W: job dependency
#PBS -W depend = Logic:JobID
This is used when a later job depends on succussful run of a previous one or similiar situations given by the following logic:
Keyword | Logic |
---|---|
after | this job is scheduled after JobID have started |
afterok | this job is scheduled after JobID have completed without errors |
afternotok | this job is scheduled after JobID have completed with errors |
afterany | this job is scheduled after JobID have completed (ignore error if any) |
before | After this job begins, JobID will schduled |
beforeok | After this job completed without errors, JobID will be scheduled |
beforenotok | After this job completed with errors, JobID will be scheduled |
beforeany | After this job completed (ignore error if any), JobID will be scheduled |
Example:
|
|
PBS -t : job arrays
This is useful when you try to run the same program but different input data. You can name the input file in such a way that the data file names are paired with the ID of jobs in the job array. So it would be controllable in a shell script loop.
#PBS -t numberID
Example:
|
|
So myprogram will use file-1 as the input file at its first time running. Then it will use file-2 as the input file at its second time running and so on.
showq
showq -u Username
Tell you information of jobs submited by username in commond fields.
showq -u Prometheus
Job Name | Username | Status | Processors | Remaining | Start Time |
---|---|---|---|---|---|
575279 | Prometheus | Running | 12 | 4:23:28:09 | Fri May 13 12:29:17 |
575280 | Prometheus | Running | 12 | 4:23:32:16 | Fri May 13 12:33:24 |
Some explainations:
- The Job Name is also the Job ID that could be used later with “qdel” command to terminate unwanted job.
- The Remaining time is how much time left for your job. when time is up the job will be terminated.
qstat
qstat -q
Tell you the information of queue system in this HPC facility, like how much is the walltime for each type of queue, how many jobs of each types are runding/avaliable/queueing.
qstat -u Username
Tell you information of jobs submited by username in commond fields.
qstat -u Prometheus
Job ID | Username | Queue | Jobname | SessID | NDS | TSK | Req Memory | Req Time | Status | Elap Time |
---|---|---|---|---|---|---|---|---|---|---|
575279.mgmt1 | Prometheus | batch | submit_146275036 | 26450 | 1 | 12 | – | 120:0 | R | 00:42 |
575280.mgmt1 | Prometheus | batch | submit_146275036 | 26450 | 1 | 12 | – | 120:0 | R | 00:38 |
Some explainations:
- Some different fields than “showq” and sometime are convinient if you want those information.
- Elapse Time is just opposite to remaining time that is given with showq.
qstat -f Jobname
Tell you the detailed information of the JobID.
qstat -f 575269
Some explainations:
The Job Name could be accquired from showq or qstat command.
There will be lots of information on this job, just mention a few useful ones: the starting directory, the bash environment variables, the path of “INCLUDE”, login host, etc
qdel
qdel Jobname
To kill an unwanted job.
qdel 575279
Some explainations:
- Don’t worry if you accidentally give a wrong jobname (maybe happened to be other username’s jobname). It simply will not work, because you don’t have the privillage to kill other user’s job.
qpeek
qpeek jobname
qpeek 575279
To check the log file for the job. if you chose to combine the output and error log, this will show both or you could specify which one you want to see:
|
|
pbsnodes
pbsnodes [-a]
To show the information of all nodes in the HPC.
pbsnodes -a
After get the nodes information (e.g. nodename), you could then use command from linux to check out the information.
|
|
qalter
qalter [options] Jobname
To change the property of a job (like the job type is batch or express) without retract it so you will still maintain the position in the queue
qalter -l walltime=4:00:00:00
Change walltime to 4 days. Depending on your privillage, as a user you actually could only decrease you walltime but couldn’t set it to longer.