Using HPCC's Standard Universe - A Simple Example
The standard universe allows a job running under Condor to handle system calls by returning the system calls to the machine that submitted the job. The standard universe also provides the mechanisms necessary to take a checkpoint and migrate a partially completed job, should the machine on which the job is executing become unavailable. To use the standard universe, it is necessary to relink the program with the Condor library using the condor_ compile command.
Simple Example: helloworld.c
#include <stdio.h>
main () {
fprintf(stdout, "Hello Condor!\n");
}
Compiling helloworld.c with Condor Libraries
Running a job in the standard universe allows your program to be automatically checkpointed and facilitates the transfer of any files that your program needs to run. The drawback is that your program will need to be compiled with the Condor libraries using condor_compile.
[jmehring@almaak-01 standard]$ condor_compile
Usage: condor_compile [options/files .... ]
where is one of the following:
gcc, g++, g77, cc, acc, c89, CC, f77, fort77, ld,
pgcc, pgf77, pgf90, or pghpf.
(on some platforms, f90 is also allowed)
[jmehring@almaak-01 standard]$ condor_compile gcc helloworld.c -o helloworld
LINKING FOR CONDOR : /usr/ccs/bin/ld -Y P,/usr/ccs/lib:/usr/lib
-Qy -R /usr/usc/gnu/gcc/3-default/lib -o helloworld
/usr/usc/condor/6.5.3/lib/condor_rt0.o
/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/crti.o /usr/ccs/lib/values-Xa.o
/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/crtbegin.o
-L/usr/usc/condor/6.5.3/lib -L/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2
-L/usr/ccs/bin -L/usr/ccs/lib
-L/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/../../ .. /var/tmp//ccISIX16.o
/usr/usc/condor/6.5.3/lib/libcondorzsyscall.a /usr/usc/condor/6.5.3/lib/libz.a -Bdynamic
-lsocket -lnsl -lgcc -lgcc_eh -R /usr/usc/gnu/gcc/3-default/lib
-lc -lgcc -lgcc_eh -R /usr/usc/gnu/gcc/3-default/lib -lc
/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/crtend.o
/usr/usc/gnu/gcc/3.3.2/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/crtn.o
/usr/usc/condor/6.5.3/lib/libcondorc++support.a
ld: warning: symbol `_siguhandler' has differing sizes:
(file /usr/usc/condor/6.5.3/lib/libcondorzsyscall.a(SIGACTION.o) value=0xb8;
file /usr/lib/libc.so value=0xbc);
/usr/usc/condor/6.5.3/lib/libcondorzsyscall.a(SIGACTION.o) definition taken
Submitting a Job from Almaak-01 to the Condor Flock -- Standard Universe
To submit a job to the standard universe, set the universe line in your submit file to standard and the transfer_executable line should be false. For the standard universe, the existence of a shared file system is not relevant. Access to files (input and output) is handled through Condor's remote system call mechanism. The executable and checkpoint files are transferred automatically, if needed. Therefore, the user does not need to change the submit description file if there is no shared file system.
As in the Using Globus Universe example, the first step in submitting a job is to create a job description as a plaintext. Our example helloworld.submit is:
executable = helloworld
transfer_executable = false
universe = standard
output = hello.out
error = hello.err
log = hello.log
queue
The second step is to set up your shell environment to include condor settings by sourcing the setup file.
for csh or tcsh:
almaak-01.usc.edu(5): source /usr/usc/condor/default/setup.csh
for any other shell:
almaak-01.usc.edu(5): source /usr/usc/condor/default/setup.sh
The third step is to submit the job:
[jmehring@almaak-01 standard]$ condor_submit helloworld.submit
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 24136.
Output Files from the Sample Run
To see the output, the user can list the output file:
[jmehring@almaak-01 standard]$ cat hello.out
Hello Condor!
To see information about the execution of the job, the user can list the log file:
[jmehring@almaak-01 standard]$ cat hello.log
001 (24136.000.000) 07/08 15:36:04 Job executing on host: <128.125.5.71:54010>
...
005 (24136.000.000) 07/08 15:36:05 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
1213 - Run Bytes Sent By Job
5412715 - Run Bytes Received By Job
1213 - Total Bytes Sent By Job
5412715 - Total Bytes Received By Job
...