Grid Computing at USC
Authentication within USC is provided by Kerberos. The Globus Toolkit grid software uses X.509 certificates for authentication. To bridge the difference between Kerberos realms and principals, on the one hand, and X.509 Public Key Infrastructure certificates on the other, we use kx509. All users of USC UNIX resources have Kerberos available to them as an authentication method. All users of USC UNIX resources also have the kx509 bridge software. This means that, potentially, every USC UNIX login ID can become grid-enabled.
USC is working to set up formal procedures to allow our authenticated users to be recognized by other established grid resources. Currently, our kx509 certificates are recognized by SURAgrid and within USCGrid. We currently have cross-certification withmost of the universities that are part of the SURAgrid.
USCGrid is composed of the main HPCC Computing Resources and the Condor pool (110 UNIX workstations running Solaris 9).
Using Grid and Condor Resources at HPCC
You can submit Condor jobs only from almaak.usc.edu. You can submit jobs to the PBS queue on almaak.usc.edu, the PBS queue on hpc-master.usc.edu, and the Condor flock. You can submit Condor jobs only from almaak.usc.edu. Remote submission to sites outside USCGrid is not yet possible. We are working on the formal procedures needed to allow us to be recognized by other grid sites and expect to receive such recognition soon.
When you are logged onto hpc-master and submit jobs through the PBS, your home directory is
the same everywhere on the RCF machines. For the most part, you don't have to worry about
architectures because most of the machines are 32bit P4 Xeons, but some are
64bit P4 Xeons or 64bit Opterons. In practice, the
differing architectures can be ignored unless you need specific architecture for
your program execution.
These are slightly different architectures and might require different compiler
flags. Please make sure that your program is compiled for the correct architecture, i.e., make sure that the programs to be run on almaak or the Condor flock are compiled for Solaris architecture and the programs to be run on the Linux cluster are compiled on the Linux cluster. Your home directory is the same everywhere in this scenario.
When you are on almaak and submitting to Condor (vanilla or standard universe),
you are running on SPARC workstations that do NOT have access to your home
directory. Your job will not execute under your userid and needs to be
supplied with all of the necessary files. There are parameters available to
transfer input, output, and data files in the Condor submit file. If you receieve an error message about missing files in the Condor job's output file, it is probably because you haven't told Condor to transfer them.
When submitting to the Condor Globus universe, your job can execute on rcf-01,
rcf1xx, or Linux nodes. Your home directory will be the same everywhere. If you submit jobs to two different architectures at the same time, then you must make sure to reference the appropriate binary files.
Condor serves two different purposes at the same time.
1. Condor is a pool of Solaris machines all over campus managed by the Condor backend daemons on almaak.
2. Condor also provides a front end for scheduling jobs.
You can talk to the front-end with the following Condor commands: condor_submit and condor_q. You can run jobs on the Condor flock as well as submit jobs to the Globus job manager.
Globus front-end commands like globus-job-submit can only talk to Globus job
managers. The Globus job managers are available on almaak and hpc-master. On
hpc-master, there is one manager, which is for PBS. On almaak, there are two managers: PBS
and Condor.