Using the Portable Batch System (PBS) on Almaak-06
Systems & Access :: Learning to Use :: Sun Computing Resource
The Portable Batch System (PBS) is used to submit jobs to the compute-only netra resources (rcf101-rcf115). PBS will schedule a user's job when sufficient resources are available and run it. To submit PBS jobs for the netras, you must login to almaak. If the user account is not active or does not have sufficient CPU Hours (CPUH), the job will not run.
Introduction to PBS on almaak
Documentation for PBS can be found at http://www.usc.edu/hpcc/pbsman/man1/pbs.html, which is a copy of the PBS man-page. The same document also is accessible on hpc-master with the command 'man pbs'. Links to all other PBS commands can be found at http://www.usc.edu/hpcc/pbsman.
In order to use the rcf1## netra compute nodes, you must log on to almaak at almaak.usc.edu and then submit a PBS job. To interact with the PBS queue, you primarily use the 'qsub' and 'qstat' commands. The qsub command is used to submit a job to the PBS queue, and the qstat command is used to check on the status of a job already in the PBS queue. On almaak each user is restricted to only one active job at any given instant, but the number of jobs on the queue is not restricted.
Using qsub
In order to run a job under PBS, you first create a control file that contains the commands to be executed. Typically, this is a PBS script. This script is then submitted to the PBS using the qsub command. The command
qsub myjob.pbs
is the simplest form of a submission, but typically you will want to ask PBS for additional resources. A job on the rcf1## netra is allowed, by default, a maximum of 0.5 CPUH (30 minutes). If your job needs to run longer than that, the control file or the command line should specify how much time is required using the 'walltime' resource. The command
qsub -l walltime=2:00:00 myjob.pbs
will ask PBS for a limit of 2 CPUH. If your job does not finish within the specified time, it will be terminated.
If you wish to see the standard output and error messages from a job, use the option '-k eo'. This option will cause the standard output 'jobname.oXX' and standard error 'jobname.eXX' to be put in the user's home directory. The 'jobname' defaults to the first 15 characters of the name of the script submitted (see -N option). 'XX' is the job's sequence number assigned by PBS. If you want to add all the above options together, the command line would be
qsub -k eo -l walltime=2:00:00,nodes=1:ppn=1 myjob.pbs
If you frequently use the same qsub options for a given script, PBS lets you put the options in the script itself so that you do not have to type them every time you submit a job.
For the above example, you would put the following lines at the beginning of your script (myjob.pbs):
# Run on 1 processor
#PBS -k eo
#PBS -l walltime=2:00:00,nodes=1:ppn=1
Then when you submit subsequent jobs, you would have to enter only the following command
qsub myjob.pbs
Remember that the script that you submit to the PBS queue may run on a system that requires a brand new login. Anything that you normally have to do to set up for running your program will have to be duplicated in the PBS script. This includes changing your current directory from the home directory to the directory required for your job. PBS provides a variable 'PBS_O_WORKDIR', that tells you what your directory was when you executed the qsub command. If you want your job to start from that same directory, you can put the following line in your script right after and '#PBS' lines:
cd "$PBS_O_WORKDIR"
Access to almaak is governed by the HPCC allocation policy. You must submit a request and be granted an allocation. If you have more than one allocation because you are involved in more than one project, or because you are taking a class in parallel programming, you may have multiple allocations. If so, you must specify which account you are using either on your qsub command or in a '#PBS' line in your PBS script.
The following command will submit your job to PBS using the account lc_drs (note that the account is specified with an uppercase A):
qsub -A lc_drs
You can find out what accounts you may access and their balances with the qbalance command:
qbalance -h