Lab Exercise: globusrun

Purpose:

During this lab the user will become familiar with the globus commands to execute jobs on the grid.

  1. Setup environment
  2. globus-job-run
  3. globus-job-submit
  4. globusrun & RSL
  5. Staging with globusrun & RSL

More information can be found at: http://www-fp.globus.org/gt2.2/admin/guide-user.html

 

Setup Environment

  1. When submitting jobs in the grid environment, whether they be globus-based commands or Condor-based jobs, the server needs to be able to open TCP/IP network connections back to the client.  During this workshop, the local DNS system used by the server will not contain the correct IP addresses for our laptop hostnames.  Therefore, we need to tell the server what our client IP address is using a special environment variable called GLOBUS_HOSTNAME.
     
  2. First, determine the IP address that was assigned to your laptop by DHCP:
     
$ /sbin/ifconfig

eth0 Link encap:Ethernet HWaddr 00:0C:29:A6:61:20
inet addr:192.168.0.203 Bcast:192.168.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1356 Metric:1
RX packets:1769 errors:0 dropped:0 overruns:0 frame:0
TX packets:798 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:219389 (214.2 Kb) TX bytes:93082 (90.9 Kb)
Interrupt:10 Base address:0x1080

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:123 errors:0 dropped:0 overruns:0 frame:0
TX packets:123 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:20515 (20.0 Kb) TX bytes:20515 (20.0 Kb)

$

For most laptops, look for the inet addr for the eth0 link.

  1. Export a new environment variable called GLOBUS_HOSTNAME, using that IP address for it's value.
$ export GLOBUS_HOSTNAME=192.168.0.203

$ echo $GLOBUS_HOSTNAME

192.168.0.203
$
  1. Check that the export worked correctly by issuing the following command:
$ globus-hostname

192.168.0.203
$

The response should be your IP address.  If your laptop's hostname appears, please export the variable again.

 

globus-job-run

  1. The basic syntax is:

globus-job-run 'contact string' command

The contact string specifies a machine, port, and service to send a request to.  The syntax of contact string is, machine:port/jobmanager-name. All of the following are valid forms:

    hostname
    hostname:port   
    hostname:port/jobmanager-name
    hostname/jobmanager-name
    hostname:/jobmanager-name
    hostname::subject
    hostname:port:subject
    hostname/jobmanager-name:subject
    hostname:/jobmanager-name:subject
    hostname:port/jobmanager-name:subject

The default port is 2119 and the default job manager name is, "jobmanager."  Run the test from the above section and include the port number and then the jab manager name.

Issue the following commands:

$ globus-job-run ldas-grid.ligo-la.caltech.edu /bin/date

Fri Dec 10 12:48:08 CST 2004

$ globus-job-run ldas-grid.ligo-la.caltech.edu:2119 /bin/date

Fri Dec 10 12:49:08 CST 2004

$ globus-job-run ldas-grid.ligo-la.caltech.edu:2119/jobmanager /bin/date

Fri Dec 10 12:49:55 CST 2004

In the event of errors, many common causes and solutions can be found at http://www.globus.org/about/faq/errors.html.

  1. By using the -s option you are able to make use of staging. 

Start by creating a directory in your home directory called lab5 and cd into it:

$ cd

$ mkdir lab5

$ cd lab5

This new lab5 directory should be used to contain any files we create during the remainder of this lab exercise.

  1. Create a script called myprog.sh containing the lines:
 #!/bin/sh    
 /bin/date
  1. This program can now be run thus:
$ cat myprog.sh

#!/bin/sh
/bin/date

$ globus-job-run ldas-grid.ligo-la.caltech.edu -s myprog.sh

Fri Dec 10 13:03:15 CST 2004

 

 

globus-job-submit

  1. globus-job-submit is a batch interface to the GRAM server. It will return immediately, leaving you with a contact string that you can use to query the status of your job. The basic syntax is:

globus-job-submit 'contact string' command

  1. Create a script called mysubmit.sh, with the following contents:
 #!/bin/sh

 /bin/date
 /bin/sleep 10  
 /bin/date
  1. First, run this job using globus-job-run
$ cat mysubmit.sh
#!/bin/sh

/bin/date
/bin/sleep 10  
/bin/date

$ globus-job-run ldas-grid.ligo-la.caltech.edu -s mysubmit.sh

Fri Dec 10 13:12:14 CST 2004
Fri Dec 10 13:12:24 CST 2004

Notice there is a 10 second difference in the time.

  1. Now, run the script using globus-job-submit.
$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh

https://ldas-grid.ligo-la.caltech.edu:40001/11364/1109306974/

You will notice that the output is not the date and time as it was with golbus-job-run.  Instead we are given a contact string.  This string can be used with several other utilities to obtain data about the running job.  These utilities are:

  1. So, let's use globus-job-get-output to see what our script did:
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11364/1109306974/

Fri Dec 10 13:12:47 CST 2004
Fri Dec 10 13:12:57 CST 2004

As you should see, once again the times are 10 seconds apart.

  1. Modify the script mysubmit.sh to contain:
 #!/bin/sh

 /bin/date
 /bin/sleep 20  
 /bin/date
  1. This time when you run globus-job-submit, use globus-job-status immediately afterwards to see the status of the job, as follows:
$ cat mysubmit.sh
 #!/bin/sh

 /bin/date
 /bin/sleep 20  
 /bin/date

$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh

https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

ACTIVE

  1. You should see either ACTIVE or DONE.  If your job is ACTIVE, wait a moment and run globus-job-status again.
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

DONE
  1. This indicates the job is done.  Get the output and see what the times are:
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

Fri Dec 10 13:20:28 CST 2004
Fri Dec 10 13:20:48 CST 2004
  1. The last utility is globus-job-clean.  This program does two things.  First, it stops the job indicated by the contact string, if that job is still running.  Secondly, it removes the cached output. 

So, lets remove the cached output from the last job we ran.  If you rerun the globus-job-get-output command, with the contact string from your last job.  You will see that once again the output is displayed.  This output stays around until it is removed.  Depending out the output of your job, you might want to do this from time to time.  Run globus-job-clean with your contact string.  You will be given a warning about the consequences of running this utility, answer "y".

$ globus-job-clean https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource

Are you sure you want to cleanup the job now (Y/N) ?
y

Cleanup successful.
  1. Now try to look at the output.
$ globus-job-get-output https://ldas-grid.ligo-la.caltech.edu:40001/11899/1109307174/

Invalid job id.

The cached output is gone.

  1. Modify the mysubmit.sh file once more and run it.
 #!/bin/sh

 /bin/date
 /bin/sleep 200  
 /bin/date
$ cat mysubmit.sh
 #!/bin/sh

 /bin/date
 /bin/sleep 200  
 /bin/date

$ globus-job-submit ldas-grid.ligo-la.caltech.edu -s mysubmit.sh

https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/

  1. Verify the job is running:
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/

ACTIVE
  1. Now let's kill the job using globus-job-clean.
$ globus-job-clean https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/

WARNING: Cleaning a job means:
- Kill the job if it still running, and
- Remove the cached output on the remote resource

Are you sure you want to cleanup the job now (Y/N) ?
y

Cleanup successful.
  1. Verify the job is gone.
$ globus-job-status https://ldas-grid.ligo-la.caltech.edu:40001/13372/1109307757/

DONE

 

 

globusrun & RSL

  1. Use globusrun to verify that the gatekeeper jobmanager is running on the remote host. 

This is accomplished by using the -a (or -authenticate-only) command line option.  This option submits a gatekeeper "ping" request only.  It does not parse the RSL or submit the job request.

$ globusrun -r ldas-grid.ligo-la.caltech.edu -a

GRAM Authentication test successful

We now know we can connect to the remote host and that the jobmanager is running and ready to accept our jobs.

  1. Let's start with a very simple example. 

Issue the following:

$ globusrun -o -r ldas-grid.ligo-la.caltech.edu/jobmanager '&(executable=/usr/bin/cal)'

     January 2005
Su Mo Tu We Th Fr Sa
                   1
 2  3  4  5  6  7  8
 9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31

This will cause the UNIX utility cal run on the remote server for us.

If the job manager cannot find the executable, you will receive the following message:
 
GRAM Job failed because the executable does not exist (error code 5)

In such a case, find out where the executable with the which command...

$ which cal

/usr/usr/bin/cal

...and reexecute the globusrun command with the correct location.
  1. Copy  myprog.sh  to the ldas-grid server.  The contents of this script are:
 #!/bin/sh  
 /bin/date
$ scp myprog.sh ldas-grid.ligo-la.caltech.edu:~

$

Set the file as executable on the remote server.

$ ssh ldas-grid.ligo-la.caltech.edu

$ cat myprog.sh
#!/bin/sh  
/bin/date

$ chmod +x myprog.sh

$ ls -la myprog.sh

-rwxrwxr-x 1 mfreemon mfreemon 20 Feb 24 23:30 myprog.sh

$ exit
$

  1. The advantage of using g globusrun over the previous two utilities is that globusrun can use full RSL scripts.  So, let's first create a simple RSL script.  Create a file on your local machine called myrsl and copy the following into it.
(* this is a comment *)
& (executable = myprog.sh )
(directory = /data2/<userid> )  
(arguments = arg1 "arg 2")
(count = 1)

The program that is identified my the executable tag must be found in the directory defined by the directory tag.  Both of these values must refer to actual objects on the remote host.  You will need to set the directory value to a valid path on the server.

For a full discussion about t RSL refer to: http://www.globus.org/gram/rsl_spec1.html

  1. Now run the script by using the globusrun command.

$ globusrun -r ldas-grid.ligo-la.caltech.edu -f myrsl

globus_gram_client_callback_allow successful
GRAM Job submission successful
GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE

  1. Let's run it again and take a look at the output.  Use the -o option for this.  The -o (or -output-enable) option uses the GASS Server library to redirect standout output and standard error to globusrun.

 $ $ globusrun -r ldas-grid.ligo-la.caltech.edu -f myrsl -o
Fri Dec 10 14:23:50 CST 2004

For information about Globus GASS look at: http://www.globus.org/gass/



 

Staging with globusrun & RSL

  1. Another useful aspect is staging.  If the output is more then a single line it might be useful to save it to a file.  For convenience you would like the file on the local machine so that you can examine it.  We can also have the executable and any input files transferred to the server automatically rather than us having to do it manually. 

    In order to do this we need to create two new files.

Create a file (on your local machine) called rsl_test.sh and place the following lines into it:

 #!/bin/sh
 /bin/ls $1  
  1. Create a new RSL script.  Call this one rsltest and have it contain:
(* this is a comment *)
& (executable = $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5/rsl_test.sh )  
(arguments = "-ltra" )
(stdout = $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5/filelist.txt)
(count = 1)
  1. Now run the script. 

Include the -w  (or -write-allow)  option to take advantage of the GASS server ability to write to the local machine.

$ globusrun -w -r ldas-grid.ligo-la.caltech.edu -f rsltest

You will notice in the about RSL definition that the executable and stdout tags include $(GLOBUSRUN_GASS_URL).  This takes advantage of the GASS server and is a predefined value for use by globusrun - as the name implies.  With the -w set, globusrun will start up the GASS server (i.e. an https server) within itself and prepends the definition of GLOBUSRUN_GASS_URL to the RSL.  This is done before the RSL is submitted to GRAM.  This is a client side feature of globusrun.

Solutions to common GRAM error messages can be found at  http://www.globus.org/about/faq/errors.html

  1. Display the contents of filelist.txt.

$ cat filelist.txt

total 98
-rw-r--r-- 1 mfreemon mfreemon 658 Feb 10 11:22 .zshrc
-rw-r--r-- 1 mfreemon mfreemon 120 Feb 10 11:22 .gtkrc
-rw-r--r-- 1 mfreemon mfreemon 383 Feb 10 11:22 .emacs
-rw-r--r-- 1 mfreemon mfreemon 124 Feb 10 11:22 .bashrc
-rw-r--r-- 1 mfreemon mfreemon 302 Feb 10 11:22 .bash_logout
drwxr-xr-x 3 mfreemon mfreemon 80 Feb 21 13:00 .globus
-rw-r--r-- 1 mfreemon mfreemon 348 Feb 24 22:34 .bash_profile
drwxr-xr-x 64 root root 1544 Mar 4 12:34 ..
drwx------ 2 mfreemon mfreemon 80 Mar 7 10:38 .ssh
-rw-r--r-- 1 mfreemon mfreemon 6604 Mar 8 09:49 .nfs001d431300000080
-rwxrwxr-x 1 mfreemon mfreemon 20 Mar 8 10:14 myprog.sh
-rw-r--r-- 1 mfreemon mfreemon 6155 Mar 8 10:54 gram_job_mgr_14994.log
-rw-r--r-- 1 mfreemon mfreemon 8994 Mar 8 10:57 gram_job_mgr_15264.log
-rw------- 1 mfreemon mfreemon 13560 Mar 8 10:59 .bash_history
-rw-r--r-- 1 mfreemon mfreemon 3629 Mar 8 11:00 gram_job_mgr_15583.log
drwx------ 6 mfreemon mfreemon 632 Mar 8 11:02 .
-rw-r--r-- 1 mfreemon mfreemon 8125 Mar 8 11:02 gram_job_mgr_15729.log

$

Recall the RSL contained in the rsltest file.  In particular, look at the line that defines the arguments.  This RSL parameter defines what the command line arguments are for the executable.  In this case we have defined -ltra.  This tells the ls command to print out the long version of the files in ascending order.  You can change this argument to any of the valid ls options.  Try changing it and then seeing what the new output is like.

  1. RSL also allows us to stage input to the executable. We will create three new files to look at this feature.

First, create a new file called rsl_test2.sh.  It should contain the following:

 #!/bin/sh
 read VAL
 /bin/ls $1 $VAL  
  1. Create a new rsl file called rsltest2:
(* this is a comment *)
& (rsl_substitution = (EXECDIR $(GLOBUSRUN_GASS_URL)/home/<userid>/lab5) )  
(executable = $(EXECDIR)/rsl_test2.sh )
(arguments = "-ltra" )
(stdout = $(EXECDIR)/stage.out)
(stdin = $(EXECDIR)/stage_in.txt)
(count = 1)

You will notice a new tag called: stdin.  This allows us to define a file to be used as the standard input for the executable on the remote machine. 

  1. Create a file called: stage_in.txt.  Enter the following as its contents:
 /tmp  

This file will tell our executable what directory to do an ls of.

  1. Now run our RSL script.

$ globusrun -w -r ldas-grid.ligo-la.caltech.edu -f rsltest2

  1. You should see a file called stage.out.  Look at its contents. 
$ cat stage.out

total 736
-r--r--r-- 1 root root 92940 Mar 20 2004 sysstat-4.0.7-4.rhl9.1.i386.rpm
drwxr-xr-x 2 root root 4096 Jan 14 15:59 clean
-rw------- 1 dietz dietz 2795 Feb 2 16:22 x509up_u4152
drwxrwxrwt 2 root root 4096 Feb 24 10:21 .ICE-unix
drwxr-xr-x 44 root root 4096 Feb 24 10:21 ..
drwxrwxrwt 2 root root 4096 Feb 24 10:22 .font-unix
-rw------- 1 lindy lindy 6412 Feb 24 16:37 x509up_p8513.fileuMu4h0.1
drwx------ 2 lindy lindy 4096 Feb 25 09:38 ssh-btXZH25540
-rw------- 1 lindy lindy 6412 Mar 1 03:47 x509up_p18079.filehvYknH.1
-rw------- 1 lindy lindy 6416 Mar 1 04:20 x509up_p19470.fileJFK0Hk.1
-rw------- 1 dietz dietz 2795 Mar 2 14:37 x509up_p27415.file7WCuuj.1
drwxr-xr-x 2 dietz dietz 4096 Mar 2 15:12 hsperfdata_dietz
-rw------- 1 kipp kipp 2783 Mar 2 17:55 x509up_u4161
-rw-r--r-- 1 root root 5258 Mar 3 13:39 grid-mapfile.LDRdataFindServer.gateway
-rw------- 1 dietz dietz 6416 Mar 3 15:03 x509up_p7639.fileK7r7JF.1
-rw------- 1 root root 6059 Mar 3 17:37 x509up_p10774.fileHMLVi5.1
-rw-r--r-- 1 root root 5028 Mar 3 17:39 grid-mapfile.LDRdataFindServer.gridmon
-rw------- 1 kipp kipp 2783 Mar 4 11:11 x509up_4161
-rw------- 1 root root 6059 Mar 4 13:41 x509up_p13924.fileKaNRaS.1
-rw------- 1 lindy lindy 6416 Mar 4 13:43 x509up_p14096.filef0AViv.1
-rw------- 1 bmoe bmoe 6026 Mar 7 11:18 x509up_p24344.fileVaCV2G.1
-rw------- 1 lindy lindy 6412 Mar 7 18:54 x509up_p12589.fileearQVP.1
-rw------- 1 sung sung 6404 Mar 8 09:40 x509up_p10274.fileLBQ0Kh.1
-rw------- 1 cadonati cadonati 6051 Mar 8 09:50 x509up_p10717.fileNJIwq4.1
-rw------- 1 mfreemon mfreemon 5301 Mar 8 09:58 x509up_p10984.filegZx93w.1
-rw------- 1 gonzalez gonzalez 6420 Mar 8 11:01 x509up_p14909.filexivibH.1
-rw------- 1 dbrown dbrown 6 Mar 8 11:21 onasysd.inspiral_S4L1.SiWrQT.pid
-rw-rw-r-- 1 dbrown dbrown 55 Mar 8 11:21 onasysd.inspiral_S4L1.SiWrQT.info
drwxrwxrwt 7 root root 4096 Mar 8 11:22 .
$

You can play around with this by changing directories in the stage_in.txt file and my modifying the arguments like to any valid ls option.