Writing a PBS Script
From Yale HPC Wiki
For information on PBS please see our PBS Professional page.
Contents |
using your qsub script
To run your script, add it your qsub command:
qsub my_qsub_script.sh
Passing your shell variables to qsub jobs
qsub does not use your current shell variables. Extra statements are needed to use module commands in your qsub script (see module page for more info).
To pass your current shell variables (eg. PATH, LIBRARY_PATH) to your jobs, add the "-V" flag to when you submit your scripts:
qsub -V my_qsub_script
"nodes" notation VS "select" notation
The examples below use the old "nodes" notation of requesting resources. On Bulldogj and newer, we recommend using the "select" notation.
The Basic Script
Jobs can be submitted from the login node with the qsub command. Each job is simply a shell script that will be executed when resources are available. This is a minimal PBS script:
#!/bin/bash #PBS -q queue_name #PBS -l nodes=1:ppn=1 ## Command to execute. The sleep command below ## just says to do nothing for 30 seconds sleep 30
This script will submit the job to queue queue_name and request the use of one node/one processor. Your queue_name should be given to you when your account was created.
If the script is called test.sh, the job is then submitted as follows:
[user@bulldog ~]$ qsub test.sh 1879.master
qsub will return the job identifier. You can use the job number to then check on the status of the job using the command:
[user@bulldog ~]$ qstat 1879 Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 1879.master test.sh jb723 0 R hpcg
If you wish to kill or cancel this job use the qdel command. For example,
[user@bulldog ~]$ qdel 1879 [user@bulldog ~]$ qstat 1879 qstat: Unknown Job Id 1879.master
A More Involved Example
There are many different PBS directives that can be used. Another more involved example would be as follows:
#!/bin/sh ### Set the job name #PBS -N myprogram ### Declare myprogram non-rerunable #PBS -r n ### Combine standard error and standard out to one file. #PBS -j oe ### Have PBS mail you results #PBS -m ae #PBS -M my.email@yale.edu ### Set the queue name, given to you when you get a reservation. #PBS -q workq ### Specify the number of cpus for your job. #PBS -l select=1 # Switch to the working directory; by default PBS launches processes from your home directory. echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` # Run your job ./myprogram
PBS Scripts for MPI Jobs
MPICH2 or MVAPICH2
MPICH2 uses mpd daemons to pass mpi messages. MPICH2 online documentation can be found at this link. In each user's home directory a .mpd.conf file is created and populated with a random secretword string. Therefore you can run an mpi job using the simple script below:
#!/bin/sh ### Set the job name #PBS -N myprogram ### Declare myprogram non-rerunable #PBS -r n ### Combine standard error and standard out to one file. #PBS -j oe ### Have PBS mail you results #PBS -m ae #PBS -M my.email@yale.edu ### Set the queue name, given to you when you get a reservation. #PBS -q workq ### Specify the number of cpus for your job. This example will run on 32 cpus ### using 8 nodes with 4 processes per node. #PBS -l nodes=8:ppn=4 # Switch to the working directory; by default PBS launches processes from your home directory. # Jobs should only be run from /home, /project, or /work; PBS returns results via NFS. echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` # Define number of processors NPROCS=`wc -l < $PBS_NODEFILE` # And the number or hosts NHOSTS=`cat $PBS_NODEFILE|uniq|wc -l` echo This job has allocated $NPROCS cpus MPD_CON_EXT=`date` mpdboot --file=$PBS_NODEFILE -n $NHOSTS mpiexec -n $NPROCS my_job mpdallexit
Using the script above the only things you'll need to change are "my_job" to the correct executable name, "workq" to the correct queue name and the "-l" parameters to define how many CPUs your job should run on.
NOTE:
Please be sure that the variable MPD_CON_EXT=`date` is set in your pbs script before the mpdboot line. This changes the default MPICH2 behavior of associating all jobs by the same user to one mpd process per node. Adding this variable forces MPICH2 to launch a separate mpd per job. Without this variable a user with more than one job per node will have all their jobs fail once any job exits with the mpdallexit command. [1]
OpenMPI
One of the advantages of OpenMPI over MPICH2 is that it's PBS aware, which means it knows exactly where and how to start the communication daemons.
For more information on OpenMPI see http://www.open-mpi.org/faq/?category=running. OpenMPI has a LARGE selection of options and performance tweaks which can be applied at run time. If you're having problems the openMPI email list is VERY active.
#!/bin/sh ### Set the job name #PBS -N myprogram ### Declare myprogram non-rerunable #PBS -r n ### Combine standard error and standard out to one file. #PBS -j oe ### Have PBS mail you results #PBS -m ae #PBS -M my.email@yale.edu ### Set the queue name, given to you when you get a reservation. #PBS -q workq ### Specify the number of cpus for your job. This example will run on 32 cpus ### using 8 nodes with 4 processes per node. #PBS -l nodes=8:ppn=4 # Switch to the working directory; by default PBS launches processes from your home directory. # Jobs should only be run from /home, /project, or /work; PBS returns results via NFS. echo Working directory is $PBS_O_WORKDIR cd $PBS_O_WORKDIR echo Running on host `hostname` echo Time is `date` echo Directory is `pwd` echo This jobs runs on the following processors: echo `cat $PBS_NODEFILE` # Define number of processors NPROCS=`wc -l < $PBS_NODEFILE` # And the number or hosts NHOSTS=`cat $PBS_NODEFILE|uniq|wc -l` echo This job has allocated $NPROCS cpus mpirun my_job
Using the script above the only things you'll need to change are "my_job" to the correct executable name, "workq" to the correct queue name and the "-l" parameters to define how many CPUs your job should run on.
Additional PBS information
See our PBS Professional page.

