1. MOP status a. GDMP bug fix Shazhad explained the bug (related to GDMP publish) and fix, see http://listserv.fnal.gov/scripts/wa.exe?A2=ind0203&L=uscms-grid-testbed&F=&S=&P= 2935 So far San Diego, Wisconsin, Florida and Fermilab have fixed the bug. Suresh will perform the bug fix at Caltech early next week. b. Master sites: Fermilab/Wisconsin Greg announced the successful completion of the CMKIN stage for the MOP production (occured on Wednesday). 400 CMKIN jobs were submitted from Wisconsin to Florida where the .ntpl files were generated and then returned back to Wisconsin. Unfortunately, Peter is now on (a well deserved!) vacation and Greg and Alain have had problems duplicating the earlier success. In particular, globus-url-copy works very inconsistenly due to intermittent authentication problems. After the telecon, Greg found that re-trying the stage-in DAG node multiple times (in fact, several times) works. c. Slave sites: Florida: Up and running, no known problems. San Diego Up and running, no known problems. Caltech Suresh happily announced that the Caltech cluster (1 server + 3 nodes) is now up and running. There is a problem with the Condor GRAM job manager which Suresh and Rick will debug on Monday. d. Koen's Virtual Data Grid Prototype schedule (MOP is a piece of the proposed prototype): Greg explained how the current absence of some elements of Koen's diagram for a prototype Virtual Data Grid could be used to better interface Impala and MOP. In particular MOP uses physical file names wheras Impala can now use logical file names. Planner number 2, which creates a concrete dag from an abstract dag, would now be very useful. Jennifer indicated that Anne Chervenak at ISI is a good contact for job planning. Greg and Rick will contact her next week. As soon as the current MOP production is finished, we should document all of the problems, fixes, and wishes that we had and then use MOP (and this experience) as a base for developing the Virtual Data Grid System Prototype from Koen. Rick and Greg will also create a CVS repository for developing the Prototype. 2. Monitoring / site information a. ability to log in remotely and discuss with the root administrator over the phone - very important for debugging. This has been very useful for MOP. b. What type of monitoring? Performance monitoring: CPU, Network traffic, disk space, etc. Configuration monitoring: Globus/Condor/GDMP configuration, shared file systems, etc. Application monitoring: process watching, log file querying, etc. Lothar indicated that it is best to leave the general problem of monitoring to others (e.g. ppdg monitoring group) and we should concentrate on implementing a (set?) of monitoring tools which suit our immediate needs. One immediate need is the ability to know the configuration detailes (VDT, shared file system, etc) of each site. Jenny indicated that Monitoring can be thought of as three parts: fabric monitoring, grid-wide monitoring (the collection of various fabric monitoring information), and the visualisation of monitoring information. Jenny pointed out that Fermilab's NGOP and Iosef's monitoring tools are good examples of current fabric monitoring services and MDS is a good example of a grid-wide monitoring tool. MDS also provides a web based visualisation service. Harvey kindly points out that Iosif's system is prototyped for local farms but is also architected to include higher level grid level monitoring. Rick proposed deploying and testing both monitoring services on the Test Grid. c. Iosef's Monitoring package (based on Jini, SNMP, etc) While currently prototyped for monitoring farms, this system is architected for grid wide monitoring. Yujun Wu is currently investigating the deployment of Iosef's system on the USCMS Test Grid. Interfacing it to MDS for grid-wide monitoring is an appealing possibility for the future. d. Globus MDS 2.1 John Mcghee (MDS developer) indicated his willingness to help set up MDS for the USCMS Test Grid. He says that MDS can be configured (by writing simple "sensors") to do *anything* and, in particular, to monitor configuration information at each site. This is very valuable and it was fully agreed to take John up on his offer of help! Suchindra Kategari from Florida is investigating the deployment of MDS on the Test Grid. e. simple home grown monitoring (cron jobs, etc). See for example: http://www.cacr.caltech.edu/projects/tier2-support/ http://www.phys.ufl.edu/~cavanaug/uscms/testulix/index.htm This solution is fine for the short term, but should be discouraged for the medium to long term as there are security concerns and the information is not available in a standard format (accessable by grid services, for example). 3. Schedule for migrating from VDT 0.5 to VDT 1.0 a. do nothing until the current MOP production is finished b. It was agreed that migrating one site per week was a reasonable schedule. Florida would like to start as the cluster will be taken down shortly after the MOP production has finished for some much needed maintaince (kernal is old, etc). Florida can then pass any "heads up" information to the other sites. 4. The ESnet Certificate Authority is already in place. We need to begin thinking of a plan to migrate from the Globus CA to the ESnet CA. Can we do this in conjunction with the migration to the VDT 1.0? Conrad Steenberg has been investigating some Italian scripts for managing VOs. He would like to have another site in which to generalise his scripts. Rick offered one of the Florida machines to help in this regard and Conrad and Rick will work together on this next week. Jenny indicated that the globus gatekeeper will accept certificates from mulitple CAs. Hence, we can proceed with obtaining and using certificates and keys from ESnet for both the host and users. We will use Conrad's scripts for managing a VO as soon as they become available. In particular, Lothar pointed out the importance of monitoring the expiration of host and user certificates. 5. Community Authorization Service (CAS) enabled GridFTP Globus has asked us if we would like to become a "Friends of the family" alpha tester for a CAS enabled GridFTP. This is a good opportunity for us to test CAS on the USCMS Test Grid, but it will require some effort on our part. Von Welch (CAS developer) has graceously offered his help in porting CAS to RH 6.2 and installing it on the USCMS Test Grid. In return, we should be willing to learn about CAS and provide feedback back to Globus. The current difficulty is that CAS requires Python 2.2 on RH 7.2. However, installing Python 2.2 on RH 6.2 is problematic. Python 2.1 can be used, but CAS requires the xmlrpc module which is missing from Python 2.1. However (!) Conrad said that he currently has Python 2.2 installed on RH 6.2; Rick will discuss with Conrad about what can be done to possibly install Python 2.2 at the different USCMS (Rh 6.2) sites and communicate that back to Von. 6. US-ATLAS/CMS inter-grid tests An 'uscms' group account on atlas.uits.iupui.edu at the US-ATLAS Indiana Tier 2 site; Florida has created an 'usatlas' group account on testulix; grid-mapfile contact strings have been exchanged and the first US-ATLAS/CMS inter-grid test performed. Florida is now included in the US-ATLAS GridView monitor service. San Diego and Fermilab agreed to create an 'usatlas' group account. Caltech will investigate with CACR if they can also do so. Rick will send out an email with some of the US-ATLAS grid-mapfile contact strings. All US-CMS users wishing to perform US-ATLAS/CMS intergrid test, should send their grid-mapfile contact strings to the US-ATLAS contacts: Kaushik De and Shava Smallen