Lab Exercise: LSCdataFind

Purpose:

The LSCdataFind command queries a LDRdataFindServer to obtain physical filenames or URLs for data files from a certain instrument and of a particular frame type within a GPS range.  The output by default is a list of URLs suitable for use with a GridFTP client for downloading the files.  Other options allow one to generate so-called cache file suitable for use with LAL I/O routines.

During this lab the user will become familiar with the LSCdataFind tool:

  1. Checking the LSCdataFind Server
  2. Showing Observatories and Types
  3. Searching for File Names

Additional documentation can be found at http://www.lsc-group.phys.uwm.edu/lscdatagrid/doc/quicktools.html#lscdatafind

 

Checking the LSCdataFind Server

  1. First make sure you can connect and that everything is up and running.
$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --ping

LDRdataFindServer at ldas-gridmon.ligo-la.caltech.edu is alive

The --ping command pings the data server that you specify with the --server option.

Finding other sites: http://www.lsc-group.phys.uwm.edu/lscdatagrid/resources/index.html

  1. The environment variable LSC_DATAFIND_SERVER can be set to avoid the need to specify the --server argument on every command line:
$ export LSC_DATAFIND_SERVER=ldas-gridmon.ligo-la.caltech.edu

$ LSCdataFind --ping

LDRdataFindServer at ldas-gridmon.ligo-la.caltech.edu is alive

Obviously, the export command is a good candidate to consider adding to ~/.bash_profile if you're working with a single LSCdataFind server most of the time.

 

Showing Observatories and Types

  1. We can look at a listing of what observatories have data on the server we have chosen using the --show-observatories option.

$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --show-observatories

H

L
HL
HLT
None
G
GHT
AGHLT
V
$

  1. We can also use LSCdataFind to list of the types of data that are available.

$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --show-types

RDS_R_L3
RDS_R_L1

SenseMonitor_H1_M
SenseMonitor_H2_M
SenseMonitor_L1_M
RDS_R_L2
SG5
GA1
BH1
BH2
SG7
BH3
BH4
SG10
BH5
BO1
ZM1
SG12
GA2
R
BH6
BO2
None
SG13
SG1_S3_P
SG2_S3_P
SG3_S3_P
GA1_S3_P
SN1_S3_P
SG4_S3_P
SG5_S3_P
WNB1_S3_P
WNB2_S3_P
WNB3_S3_P
SN2
SN2_S3_P
WNB1A_S3_P
WNB2A_S3_P
WNB3A_S3_P
G1_RDS_C01_LX
WHISTLE_S_S3_P
WHISTLE_L_S3_P
CUSPS_S3_P
INSP_S3_P
LG_SG2
LA_SG1
SIM
SG820Q5
SG820Q15
SG235Q5
SG235Q15
GA1d0
GA4d0
DFMa2b4g1
DFMa1b2g1
LA_DS1
$


 

Searching for File Names

  1. Let's find the list of file names containing the data we need based upon some selection criteria.  We will specify a server, an observatory and a type.  Along with this we will give a start and end time to limit the range of the data that will be looked at. 

Warning:  Do not request time intervals greater than 10,000 seconds.  Requests for time intervals larger than this will cause the database to crash.

Let's start with the following command:

$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --observatory H --type RDS_R_L3 --gps-start-time 753759081 --gps-end-time 753759130

gsiftp://dataserver.phys.uwm.edu:15000/data/ gsiftp_root/ cluster_storage/ datc/s109/S3/ RDS_R_L3/H/ 753753600-753773599/H-RDS_R_L3-753759072-16.gwf
file://localhost/netdatc/s109/S3/RDS_R_L3/H/ 753753600-753773599/ H-RDS_R_L3-753759072-16.gwf
file://medusa-slave109.medusa.phys.uwm.edu/ datc/S3/RDS_R_L3/ H/753753600- 753773599/H-RDS_R_L3- 753759072-16.gwf
gsiftp://dataserver.phys.uwm.edu:15000/data/ gsiftp_root/ cluster_storage/ datc/s109/S3/RDS_R_L3/H/ 753753600-753773599/ H-RDS_R_L3-753759088-16.gwf
file://localhost/netdatc/s109/S3/RDS_R_L3/H/ 753753600-753773599/ H-RDS_R_L3- 753759088-16.gwf
file://medusa-slave109.medusa.phys.uwm.edu/ datc/S3/RDS_R_L3/H/ 753753600- 753773599/H-RDS_R_L3- 753759088-16.gwf
gsiftp://dataserver.phys.uwm.edu:15000/data/ gsiftp_root/ cluster_storage/ datc/s109/S3/RDS_R_L3/H/ 753753600-753773599/ H-RDS_R_L3-753759104-16.gwf
file://localhost/netdatc/s109/S3/RDS_R_L3/H/ 753753600-753773599/H-RDS_R_L3- 753759104-16.gwf
file://medusa-slave109.medusa.phys.uwm.edu/ datc/S3/RDS_R_L3/ H/ 753753600- 753773599/H-RDS_R_L3- 753759104-16.gwf
gsiftp://dataserver.phys.uwm.edu:15000/data/ gsiftp_root/ cluster_storage/ datc/s109/S3/ RDS_R_L3/H/ 753753600-753773599/ H-RDS_R_L3-753759120-16.gwf
file://localhost/netdatc/s109/S3/RDS_R_L3/H/ 753753600-753773599/H-RDS_R_L3- 753759120-16.gwf
file://medusa-slave109.medusa.phys.uwm.edu/ datc/S3/RDS_R_L3/ H/753753600- 753773599/H-RDS_R_L3- 753759120-16.gwf

If you look at the values returned you will notice several things.  First, they have different URL types:

  1. These can be used with the --url-type option.  This option will return only files that match the specified URL type.  Try setting the type to "file", you should now see a pruned list of values: 
$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --observatory H --type RDS_R_L3 --gps-start-time 753759081 --gps-end-time 753759130 --url-type file

file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759072-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759088-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759104-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759120-16.gwf
  1. We can further restrict our query by using the --match option.  This option allows us to specify a regular expression to be matched against.  Try setting --match to localhost.  The result should be a listing of only those values that are from the localhost (which happens to return the same results in this specific case).
$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --observatory H --type RDS_R_L3 --gps-start-time 753759081 --gps-end-time 753759130 --url-type file --match localhost

file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759072-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759088-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759104-16.gwf
file://localhost/data/node25/ S3/L3/LHO/H-RDS_R_L3-7537/ H-RDS_R_L3-753759120-16.gwf
  1. In all of the above examples the output values of LSCdataFind were formatted for use with GridFTP.  The output values can also be formatted to work with LAL cache files.  Use the option --lal-cache to do this.
$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --observatory H --type RDS_R_L3 --gps-start-time 753759081 --gps-end-time 753759130 --lal-cache --match localhost

H RDS_R_L3 753759072 16 file://localhost/ data/node25/S3/L3/LHO/ H-RDS_R_L3-7537/ H-RDS_R_L3-753759072-16.gwf
H RDS_R_L3 753759088 16 file://localhost/ data/node25/S3/L3/LHO/ H-RDS_R_L3-7537/ H-RDS_R_L3-753759088-16.gwf
H RDS_R_L3 753759104 16 file://localhost/ data/node25/S3/L3/LHO/ H-RDS_R_L3-7537/ H-RDS_R_L3-753759104-16.gwf
H RDS_R_L3 753759120 16 file://localhost/ data/node25/S3/L3/LHO/ H-RDS_R_L3-7537/ H-RDS_R_L3-753759120-16.gwf
  1. To find out the names of the files located on a server, just use the "--name-only" flag.  This will give you just the names of the data files that match your query.  These values can not then be used with GridFTP or LAL caches.
$ LSCdataFind --server=ldas-gridmon.ligo-la.caltech.edu --observatory H --type RDS_R_L3 --gps-start-time 753759081 --gps-end-time 753759130 --names-only

H-RDS_R_L3-753759072-16.gwf
H-RDS_R_L3-753759088-16.gwf
H-RDS_R_L3-753759104-16.gwf
H-RDS_R_L3-753759120-16.gwf