JGC-DataCloud-2012 paper experiments


This experiment runs workflows from the domain science of astronomy, but with the intent of inspecting the computer scientific parallel computing, data migration, and workflow aspects of it. Comparable experiments will be run on the same domain data on Amazon and TeraGrid/OSG, and the paper will discuss the observed differences and communalities. This is an extension to the http://archive.futuregrid.org/references/experiences-using-cloud-computi... paper.

Intellectual Merit

The purpose is to gather experimental data on workflows, parallel computing and data migration for a paper to be submitted journal special issue on the topic of Data Intensive Computing in the Clouds.

Broader Impact

Possible configuration issues on both, the submit machine and the resource VMs, will influence how the Pegasus tutorial is going to be set up and pans out. It may also influence decisions on Experiment Management setup and configuration issues.

Use of FutureGrid

Expected to require 128 cores for the duration of about 1 day each run. Multiple runs may be required until all wrinkles are ironed out. Multiple runs will also decide, whether to hit a single site for all resources, or to distribute the resources allocations across multiple sites.

While I would like to also utilize Eucalyptus to gather resources, Eucalyptus on FG has not worked properly since January 2011, and I will consider it non-functional until I am thoroughly convinced that it works. Even Inca tests frequently show that it does not work properly (see Inca "traffic light" monitoring summary).

Due to the lack of a proper and simple way (that is NOT ssh) to access bare metal, I cannot use bare metal, even though I crave those resources.

Scale Of Use

(128 cores x 1 day) for multiple runs, from one or more sites depending on configuration exploration.



(will be posted as paper submission.)
Mats Rynge


2 years 23 weeks ago