Past Research Projects
dV/dT: Accelerating the Rate of Progress towards Extreme Scale Collaborative Science
The dV/dt project will develop and evaluate by means of at-scale experimentation novel algorithms and software architectures that will make it less labor intensive for a scientist to find the appropriate computing resources, acquire those resources, deploy the desired applications and data on these resources, and then manage them as the applications run. The proposed research will advance the understanding of resource management within a collaboration in the areas of: trust, planning for resource provisioning, and workload, computer, data, and network resource management. This work will result in research artifacts (frameworks, algorithms, simulators, and execution traces) as well as an experimental testbed that will support the proposed research and will be made available to the broader DOE community.
Funding Agency: DOE
Precip – Pegasus Repeatable Experiments for the Cloud in Python
Precip is a flexible exeperiment management API for running experiments on clouds. Precip was developed for use on FutureGrid infrastructures such as OpenStack, Eucalyptus (>=3.2), Nimbus, and at the same time commercial clouds such as Amazon EC2. The API allows you to easily provision resources, which you can then can run commands on and copy files to/from subsets of instances identified by tags. The goal of the API is to be flexible and simple to use in Python scripts to control your experiments.Learn more Source Code
Funding Agency: NSF
WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments
WorkflowSim is an open source workflow simulator that extends CloudSim by providing a workflow level support of simulation. It models workflows with a DAG model with support an elaborate model of node failures, a model of delays occurring in the various levels of the WMS stack, and the implementations of several most popular dynamic and static workflow schedulers (e.g., HEFT, Min-Min) and task clustering algorithms (e.g., runtime-based algorithms, data-oriented algorithms and fault tolerant clustering algorithms). Parameters are directly learned from traces of real executions. It has been recently used in multiple workflow study areas such as fault tolerant clustering, balanced task clustering, cloud brokers, energy aware scheduling, cost-oriented scheduling and so on.
Funding Agency: NSF
Transforming Computational Science with ADAMANT (Adaptive Data-Aware Multi-domain Application Network Introduction Topologies)
Project ADAMANT (Adaptive Data-Aware Multi-domain Application Network Topologies) brings together researchers from RENCI/UNC Chapel Hill, Duke University and USC/ISI and two successful software tools to solve these problems: Pegasus workflow management system and ORCA resource control framework, developed for NSF GENI. The integration of Pegasus and ORCA enables powerful application- and data-driven virtual topology embedding into multiple institutional and national substrates (providers of cyber-resources, like computation, storage and networks). ADAMANT leverages ExoGENI – an NSF-funded GENI testbed, as well as national providers of on-demand bandwidth services (NLR, I2, ESnet) and existing OSG computational resources to create elastic, isolated environments to execute complex distributed tasks. This approach improves the performance of these applications and, by explicitly including data movement planning into the application workflow, enables new unique capabilities for distributed data-driven “Big Science” applications.
Funding Agency: NSF
FutureGrid is a distributed, high-performance test-bed that allows scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing.The test-bed is composed of a set of distributed high-performance computing resources connected by a high-speed network (with adjustable performance via a network impairment device). Users can access the HPC resources as traditional batch clusters, a computational grid, or as highly configurable cloud resources where users can deploy their own virtual machines. The flexibility in configuration of FutureGrid resources enables its use across a variety of research and education projects. To learn more about how to join FutureGrid, visit the “Getting Started” page as part of the FutureGrid Manual. Pegasus has two parallel roles to play in the framework of FutureGrid: (1) Vanilla Pegasus in FutureGrid is deployed to draw in existing user communities by providing a familiar context on new resources; (2) the Pegasus workflow management system is an essential building block of the Experiment Management capabilities developed within the FutureGrid context.
Funding Agency: NSF
Synthesized Tools for Archiving, Monitoring Performance and Enhanced DEbugging (STAMPEDE)
Large-scale applications today make use of distributed resources to support computations and as part of their execution, generate large amounts of log information. Up to now, we have been using the Netlogger analysis tools to perform off-line log analysis. Stampede extends the current offline workflow log analysis capability and develops a comprehensive middleware solution that will allow users of complex scientific applications to track the status of their jobs in real time, to detect execution anomalies automatically, and to perform on-line troubleshooting without logging in to remote nodes or searching through thousands of log files. The system will be able to capture application-level logs from jobs as they are executing on the cyberinfrastructure. At the same time, it will also collect log information from the underlying cyberinfrastructure services, such as resource management and data transfer. These end-to-end logs will be combined and brokered through a subscription interface. External components will use the subscription interface to provide monitoring services.
Funding Agency: NSF
The Brain Span project seeks to find when and where in the brain a gene is expressed. This information holds clues to potential causes of disease. A recent study found that forms of a gene associated with schizophrenia are over-expressed in the fetal brain. To make such discoveries about what is abnormal, scientists first need to know what the normal patterns of gene expression are during development. To this end, the National Institute of Mental Health (NIMH), part of the National Institutes of Health (NIH), has funded the creation of TADHB. To map human brain “transcriptomes”, researchers identify the composition of intermediate products, called transcripts or messenger RNAs, which translate genes into proteins throughout development. As part of this project we have enabled the geneticists to analyze over 225 human brain RNA sequences using two different mapping algorithms CASAVA ELAND and Perm.Learn more
Corral and glideinWMS currently operate as standalone resource provisioning systems. GlideinWMS was initially developed to meet the needs of the CMS (Compact Muon Solenoid) experiment at the Large Hadron Collider (LHC) at CERN. It generalizes a Condor glideIn system developed for CDF (The Collider Detector at Fermilab) and first deployed for production in 2003. It has been in production across the Worldwide LHC Computing Grid (WLCG), with major contributions from the Open Science Grid (OSG) in support of CMS for the past two years, and has recently been adopted for user analysis. GlideinWMS also is currently being used by the CDF, DZero, and MINOS experiments, and servicing the NEBioGrid and Holland Computing Center communities. GlideinWMS has been used in production with more than 8,000 concurrently running jobs; the CMS use alone totals over 45 million hours.Learn more