Kepler-Based Workflow for Running the Community Climate System Model
Participants
Ufuk Turuncoglu/Istanbul Technical University, Sylvia Murphy/ESMF Core Team, Cecelia DeLuca/ESMF Core Team
For questions on this project, please contact curator@list.woc.noaa.gov.
Motivation
This work is driven by the complexities of running a large modeling system in a high performace computing environment and the need to reduce those complexities, particularly for the average user. A modeling workflow can take the numerous, repetitious steps involved in running a model and hide them from the user. This enables the user to focus on the science of his or her experiment and not on the queuing software variant, for instance, needed for a particular machine. Where workflows can collect provenance information, the workflow has the added benefit of documenting a run in far greater detail than before. This facilitates exploration of runs and is a step towards reproducibility.
Approach
The goal of Ufuk's project is to run the Community Climate System Model (CCSM) on the TeraGrid, gather its provenance information, and export its scientific metadata to the Earth System Grid (ESG).
Conceptual Workflow
The above figure (click on image for larger version) describes the proposed CCSM workflow. Included are steps to establish a connection to a remote machine, copy CCSM to that machine, build and modifity a CCSM case, run the simulation, and trigger a follow on dataprocessing workflow. Each of the steps represented are actually contained within a Kepler actor or director.
System Provenance Collection
One advantage of a modeling workflow is the ability to collect system provenance information. System provenance includes things like the operating system, the compiler, and the environment variables set at the time of compiliation. All of this information is needed to potentially understand a model run or reproduce it since machine architectures do effect the outcomes of model runs.
Ufuk has created a multi-teired approach to the collection of system provenance information. In the figure below, one can see that system information exists and several different layers within the application.
The workflow gathers provenance from the individual model components, from the machine, from any required third-party libraries, and from the workflow itself. All of this information is concatinated into a single stream and output as an XML file.
References
- Turuncoglu, U., and Murphy, S., (2009), Technical Summary and Progress Report for a Kepler-based Modeling Workflow System. (Download PDF)
- Towards Self-describing Workflows, Ufuk Turuncoglu, Sylvia Murphy, and Cecelia Deluca, CCSM Workshop, Breckenridge, CO, June 16-19, 2009. (Download Poster in PDF format)
