Lab Exercise: globusrun
More information can be found at: http://www-fp.globus.org/gt2.2/admin/guide-user.html
$ /sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:A6:61:20
inet addr:192.168.0.203 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1356 Metric:1
RX packets:1769 errors:0 dropped:0 overruns:0 frame:0
TX packets:798 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:219389 (214.2 Kb) TX bytes:93082 (90.9 Kb)
Interrupt:10 Base address:0x1080
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:123 errors:0 dropped:0 overruns:0 frame:0
TX packets:123 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:20515 (20.0 Kb) TX bytes:20515 (20.0 Kb)
$For most laptops, look for the inet addr for the eth0 link.
$ export GLOBUS_HOSTNAME=192.168.0.203
$ echo $GLOBUS_HOSTNAME
192.168.0.203
$
$ globus-hostname
192.168.0.203
$The response should be your IP address. If your laptop's hostname appears, please export the variable again.
globus-job-run 'contact string' command
The contact string specifies a machine, port, and service to send a request to. The syntax of contact string is, machine:port/jobmanager-name. All of the following are valid forms:
hostname
hostname:port
hostname:port/jobmanager-name
hostname/jobmanager-name
hostname:/jobmanager-name
hostname::subject
hostname:port:subject
hostname/jobmanager-name:subject
hostname:/jobmanager-name:subject
hostname:port/jobmanager-name:subjectThe default port is 2119 and the default job manager name is, "jobmanager." Run the test from the above section and include the port number and then the jab manager name.
Issue the following commands:
$ globus-job-run ldas-grid.ligo-la.caltech.edu /bin/date
Fri Dec 10 12:48:08 CST 2004$ globus-job-run ldas-grid.ligo-la.caltech.edu:2119 /bin/date
Fri Dec 10 12:49:08 CST 2004$ globus-job-run ldas-grid.ligo-la.caltech.edu:2119/jobmanager /bin/date
Fri Dec 10 12:49:55 CST 2004In the event of errors, many common causes and solutions can be found at http://www.globus.org/about/faq/errors.html.
Start by creating a directory in your home directory called lab5 and cd into it:
$ cd
$ mkdir lab5$ cd lab5
This new lab5 directory should be used to contain any files we create during the remainder of this lab exercise.
#!/bin/sh
/bin/date
$ cat myprog.sh
#!/bin/sh
/bin/date$ globus-job-run ldas-grid.ligo-la.caltech.edu -s myprog.sh
Fri Dec 10 13:03:15 CST 2004
globus-job-submit
is a batch interface to the GRAM server. It
will return immediately, leaving you with a contact string that you can use to
query the status of your job. The basic syntax is: globus-job-submit 'contact string' command
#!/bin/sh
/bin/date
/bin/sleep 10
/bin/date
$ cat mysubmit.sh
#!/bin/sh
/bin/date
/bin/sleep 10
/bin/date$ globus-job-run ldas-grid.ligo-la.caltech.edu -s mysubmit.sh
Fri Dec 10 13:12:14 CST 2004
Fri Dec 10 13:12:24 CST 2004Notice there is a 10 second difference in the time.
$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh
https://ldas-grid.ligo-la.caltech.edu:40001/11364/1109306974/You will notice that the output is not the date and time as it was with golbus-job-run. Instead we are given a contact string. This string can be used with several other utilities to obtain data about the running job. These utilities are:
- globus-job-status - get a status of PENDING, ACTIVE, DONE, and FAILED.
- globus-job-get-output - once the job is done, collect the output with this command.
- globus-job-clean - stops the job if it is running, and cleans up the cached copy of the output.
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11364/1109306974/
Fri Dec 10 13:12:47 CST 2004
Fri Dec 10 13:12:57 CST 2004As you should see, once again the times are 10 seconds apart.
#!/bin/sh
/bin/date
/bin/sleep 20
/bin/date
$ cat mysubmit.sh
#!/bin/sh
/bin/date
/bin/sleep 20
/bin/date$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh
https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/
ACTIVE
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/
DONE
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/
Fri Dec 10 13:20:28 CST 2004
Fri Dec 10 13:20:48 CST 2004
So, lets remove the cached output from the last job we ran. If you rerun the globus-job-get-output command, with the contact string from your last job. You will see that once again the output is displayed. This output stays around until it is removed. Depending out the output of your job, you might want to do this from time to time. Run globus-job-clean with your contact string. You will be given a warning about the consequences of running this utility, answer "y".
$ globus-job-clean https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/
WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource
Are you sure you want to cleanup the job now (Y/N) ?
y
Cleanup successful.
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/
Invalid job id.The cached output is gone.
#!/bin/sh
/bin/date
/bin/sleep 200
/bin/date
$ cat mysubmit.sh
#!/bin/sh
/bin/date
/bin/sleep 200
/bin/date$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh
https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/
ACTIVE
$ globus-job-clean https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/
WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource
Are you sure you want to cleanup the job now (Y/N) ?
y
Cleanup successful.
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/
DONE
This is accomplished by using the -a (or -authenticate-only) command line option. This option submits a gatekeeper "ping" request only. It does not parse the RSL or submit the job request.
$ globusrun -r ldas-grid.ligo-la.caltech.edu -a
GRAM Authentication test successfulWe now know we can connect to the remote host and that the jobmanager is running and ready to accept our jobs.
Issue the following:
$ globusrun -o -r ldas-grid.ligo-la.caltech.edu/jobmanager '&(executable=/usr/bin/cal)' January 2005
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
This will cause the UNIX utility cal run on the remote server for us.
If the job manager cannot find the executable, you will receive the following message:
GRAM Job failed because the executable does not exist (error code 5)
In such a case, find out where the executable with the which command...
$ which cal
/usr/usr/bin/cal
...and reexecute the globusrun command with the correct location.
#!/bin/sh
/bin/date
$ scp myprog.sh ldas-grid.ligo-la.caltech.edu:~
$Set the file as executable on the remote server.
$ ssh ldas-grid.ligo-la.caltech.edu
$ cat myprog.sh
#!/bin/sh
/bin/date$ chmod +x myprog.sh
$ ls -la myprog.sh
-rwxrwxr-x 1 mfreemon mfreemon 20 Feb 24 23:30 myprog.sh
$ exit
$
(* this is a comment *)
& (executable = myprog.sh )
(directory = /data2/<userid> )
(arguments = arg1 "arg 2")
(count = 1)The program that is identified my the executable tag must be found in the directory defined by the directory tag. Both of these values must refer to actual objects on the remote host. You will need to set the directory value to a valid path on the server.
For a full discussion about t RSL refer to: http://www.globus.org/gram/rsl_spec1.html
$ globusrun -r ldas-grid.ligo-la.caltech.edu -f myrsl
globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE
$ $ globusrun -r ldas-grid.ligo-la.caltech.edu -f myrsl -o
Fri Dec 10 14:23:50 CST 2004For information about Globus GASS look at: http://www.globus.org/gass/
Create a file (on your local machine) called rsl_test.sh and place the following lines into it:
#!/bin/sh
/bin/ls $1
(* this is a comment *)
& (executable = $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5/rsl_test.sh )
(arguments = "-ltra" )
(stdout = $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5/filelist.txt)
(count = 1)
Include the -w (or -write-allow) option to take advantage of the GASS server ability to write to the local machine.
$ globusrun -w -r ldas-grid.ligo-la.caltech.edu -f rsltest You will notice in the about RSL definition that the executable and stdout tags include $(GLOBUSRUN_GASS_URL). This takes advantage of the GASS server and is a predefined value for use by globusrun - as the name implies. With the -w set, globusrun will start up the GASS server (i.e. an https server) within itself and prepends the definition of GLOBUSRUN_GASS_URL to the RSL. This is done before the RSL is submitted to GRAM. This is a client side feature of globusrun.
Solutions to common GRAM error messages can be found at http://www.globus.org/about/faq/errors.html
$ cat filelist.txt
total 98
-rw-r--r-- 1 mfreemon mfreemon 658 Feb 10 11:22 .zshrc
-rw-r--r-- 1 mfreemon mfreemon 120 Feb 10 11:22 .gtkrc
-rw-r--r-- 1 mfreemon mfreemon 383 Feb 10 11:22 .emacs
-rw-r--r-- 1 mfreemon mfreemon 124 Feb 10 11:22 .bashrc
-rw-r--r-- 1 mfreemon mfreemon 302 Feb 10 11:22 .bash_logout
drwxr-xr-x 3 mfreemon mfreemon 80 Feb 21 13:00 .globus
-rw-r--r-- 1 mfreemon mfreemon 348 Feb 24 22:34 .bash_profile
drwxr-xr-x 64 root root 1544 Mar 4 12:34 ..
drwx------ 2 mfreemon mfreemon 80 Mar 7 10:38 .ssh
-rw-r--r-- 1 mfreemon mfreemon 6604 Mar 8 09:49 .nfs001d431300000080
-rwxrwxr-x 1 mfreemon mfreemon 20 Mar 8 10:14 myprog.sh
-rw-r--r-- 1 mfreemon mfreemon 6155 Mar 8 10:54 gram_job_mgr_14994.log
-rw-r--r-- 1 mfreemon mfreemon 8994 Mar 8 10:57 gram_job_mgr_15264.log
-rw------- 1 mfreemon mfreemon 13560 Mar 8 10:59 .bash_history
-rw-r--r-- 1 mfreemon mfreemon 3629 Mar 8 11:00 gram_job_mgr_15583.log
drwx------ 6 mfreemon mfreemon 632 Mar 8 11:02 .
-rw-r--r-- 1 mfreemon mfreemon 8125 Mar 8 11:02 gram_job_mgr_15729.log
$Recall the RSL contained in the rsltest file. In particular, look at the line that defines the arguments. This RSL parameter defines what the command line arguments are for the executable. In this case we have defined -ltra. This tells the ls command to print out the long version of the files in ascending order. You can change this argument to any of the valid ls options. Try changing it and then seeing what the new output is like.
First, create a new file called rsl_test2.sh. It should contain the following:
#!/bin/sh
read VAL
/bin/ls $1 $VAL
(* this is a comment *)
& (rsl_substitution = (EXECDIR $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5) )
(executable = $(EXECDIR)/rsl_test2.sh )
(arguments = "-ltra" )
(stdout = $(EXECDIR)/stage.out)
(stdin = $(EXECDIR)/stage_in.txt)
(count = 1)You will notice a new tag called: stdin. This allows us to define a file to be used as the standard input for the executable on the remote machine.
/tmp This file will tell our executable what directory to do an ls of.
$ globusrun -w -r ldas-grid.ligo-la.caltech.edu -f rsltest2
$ cat stage.out
total 736
-r--r--r-- 1 root root 92940 Mar 20 2004 sysstat-4.0.7-4.rhl9.1.i386.rpm
drwxr-xr-x 2 root root 4096 Jan 14 15:59 clean
-rw------- 1 dietz dietz 2795 Feb 2 16:22 x509up_u4152
drwxrwxrwt 2 root root 4096 Feb 24 10:21 .ICE-unix
drwxr-xr-x 44 root root 4096 Feb 24 10:21 ..
drwxrwxrwt 2 root root 4096 Feb 24 10:22 .font-unix
-rw------- 1 lindy lindy 6412 Feb 24 16:37 x509up_p8513.fileuMu4h0.1
drwx------ 2 lindy lindy 4096 Feb 25 09:38 ssh-btXZH25540
-rw------- 1 lindy lindy 6412 Mar 1 03:47 x509up_p18079.filehvYknH.1
-rw------- 1 lindy lindy 6416 Mar 1 04:20 x509up_p19470.fileJFK0Hk.1
-rw------- 1 dietz dietz 2795 Mar 2 14:37 x509up_p27415.file7WCuuj.1
drwxr-xr-x 2 dietz dietz 4096 Mar 2 15:12 hsperfdata_dietz
-rw------- 1 kipp kipp 2783 Mar 2 17:55 x509up_u4161
-rw-r--r-- 1 root root 5258 Mar 3 13:39 grid-mapfile.LDRdataFindServer.gateway
-rw------- 1 dietz dietz 6416 Mar 3 15:03 x509up_p7639.fileK7r7JF.1
-rw------- 1 root root 6059 Mar 3 17:37 x509up_p10774.fileHMLVi5.1
-rw-r--r-- 1 root root 5028 Mar 3 17:39 grid-mapfile.LDRdataFindServer.gridmon
-rw------- 1 kipp kipp 2783 Mar 4 11:11 x509up_4161
-rw------- 1 root root 6059 Mar 4 13:41 x509up_p13924.fileKaNRaS.1
-rw------- 1 lindy lindy 6416 Mar 4 13:43 x509up_p14096.filef0AViv.1
-rw------- 1 bmoe bmoe 6026 Mar 7 11:18 x509up_p24344.fileVaCV2G.1
-rw------- 1 lindy lindy 6412 Mar 7 18:54 x509up_p12589.fileearQVP.1
-rw------- 1 sung sung 6404 Mar 8 09:40 x509up_p10274.fileLBQ0Kh.1
-rw------- 1 cadonati cadonati 6051 Mar 8 09:50 x509up_p10717.fileNJIwq4.1
-rw------- 1 mfreemon mfreemon 5301 Mar 8 09:58 x509up_p10984.filegZx93w.1
-rw------- 1 gonzalez gonzalez 6420 Mar 8 11:01 x509up_p14909.filexivibH.1
-rw------- 1 dbrown dbrown 6 Mar 8 11:21 onasysd.inspiral_S4L1.SiWrQT.pid
-rw-rw-r-- 1 dbrown dbrown 55 Mar 8 11:21 onasysd.inspiral_S4L1.SiWrQT.info
drwxrwxrwt 7 root root 4096 Mar 8 11:22 .
$You can play around with this by changing directories in the stage_in.txt file and my modifying the arguments like to any valid ls option.