Current Research Projects

pegasus-dark

Pegasus Workflow Management System

The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and now clouds. Scientific workflows allow users to easily express multi-step computations, for example retrieve data from a database, reformat the data, and run an analysis. Once an application is formalized as a workflow the Pegasus Workflow Management Service can map it onto available compute resources and execute the steps in appropriate order. Pegasus can easily handle workflows with several million computational tasks.

Learn more Source Code

Funding Agency: NSF

race_logo

Repository and Workflows for Accelerating Circuit Realization

RACE will enable researchers and design experts to expand the state-of-the art in ASIC design through novel cyberinfrastructure and workflow tools that accelerate every phase of discovery, creation, adoption, and use by linking and computing around a repository of user-generated data, including new tools, new IP blocks/libraries, new design flows, training modules, and experience-base documenting best practices to adopt (and pitfalls to avoid).

Learn more

Funding Agency: DARPA

customLogo

Panorama 360: Performance Data Capture and Analysis for End-to-end Scientific Workflows

Scientific workflows are now being used in a number of scientific domains including astronomy, bioinformatics, climate modeling, earth science, civil engineering, physics, and many others. Unlike monolithic applications, workflows often run across heterogeneous resources distributed across wide area networks. Some workflow tasks may require high performance computing resources, while others can run efficiently on high throughput computing systems. Workflows also access data from potentially different data repositories and use data, often represented as files to communicate between the workflow components. As the result of the data access patterns, workflow performance can be greatly influenced by the performance of networks and storage devices.

Learn more

Funding Agency: DOE

Secure and Resilient Architecture: Scientific Workflow Integrity with Pegasus (SWIP)

The Scientific Workflow Integrity with Pegasus project strengthens cybersecurity controls in the Pegasus Workflow Management System in order to provide assurances with respect to the integrity of computational scientific methods. These strengthened controls enhance both Pegasus’ handling of science data and its orchestration of software-defined networks and infrastructure. The result is increased trust in computational science and increased assurance in our ability to reproduce the science by allowing scientists to validate that data has not been changed since a workflow completed and that the results from multiple workflows are consistent.

Learn more

Funding Agency: NSF

In Situ Data Analytics for Next Generation Molecular Dynamics Workflows

Molecular dynamics simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have dramatically higher performance than do current systems, generating more data that needs to be analyzed. The coordination of data generation and analysis cannot rely on manual, centralized approaches as it does now. This project aims to transform the centralized nature of the molecular dynamics analysis into a distributed approach that is predominantly performed in situ. Specifically, this effort combines machine learning and data analytics approaches, workflow management methods, and high performance computing techniques to analyze molecular dynamics data as it is generated, save to disk only what is really needed for future analysis, and annotate molecular dynamics trajectories to drive the next steps in increasingly complex simulations’ workflows.

Learn more

Funding Agency: NSF

Model Integration through Knowledge-Rich Data and Process Composition (MINT)

Major societal and environmental challenges require forecasting how natural processes and human activities affect one another. There are many areas of the globe where climate affects water resources and therefore food availability, with major economic and social implications. Today, such analyses require significant effort to integrate highly heterogeneous models from separate disciplines, including geosciences, agriculture, economics, and social sciences. Model integration requires resolving semantic, spatio-temporal, and execution mismatches, which are largely done by hand today and may take more than two years. The Model INTegration (MINT) project will develop a modeling environment which will significantly reduce the time needed to develop new integrated models, while ensuring their utility and accuracy. Research topics to be addressed include: 1) New principle-based semiautomatic ontology generation tools for modeling variables, to ground analytic graphs to describe models and data; 2) A novel workflow compiler using abductive reasoning to hypothesize new models and data transformation steps; 3) A new data discovery and integration framework that finds new sources of data, learns to extract information from both online sources and remote sensing data, and transforms the data into the format required by the models; 4) A new methodology for spatio-temporal scale selection; 5) New knowledge-guided machine learning algorithms for model parameterization to improve accuracy; 6) A novel framework for multi-modal scalable workflow execution; and 7) Novel composable agroeconomic models.

Learn more

Funding Agency: DARPA

Genome_logo5_0

Population Architecture using Genomics and Epidemiology (PAGE)

Over recent years, genome-wide association studies (GWAS) have allowed researchers to uncover hundreds of genetic variants associated with common diseases. However, the discovery of genetic variants through GWAS research represents just the first step in the challenging process of piecing together the complex biological picture of common diseases. To help speed the process, the National Human Genome Research Institute, is supporting new research in existing large epidemiology studies, all with a rich range of measures of health and potential disease, and many with long-term follow-up. The focus of the new research is on how genetic variants initially identified through GWAS research are related to a person’s biological and physical characteristics, such as weight, cholesterol levels, blood sugar levels or bone density. Scientists will also examine how non-genetic factors, such as diet, medications and smoking, may interact with genetic factors or each other to influence health outcomes.

Learn more

Funding Agency: NIH

cgsmd_logo

Center for Collaborative Genetic Studies on Mental Disorders (CSGSMD)

The Center for Collaborative Genetic Studies on Mental Disorders is a collaboration of Rutgers University RUCDR, Washington University in St. Louis and the University of Southern California’s Information Sciences Institute. It is funded by a grant from the National Institute of Mental Health. The Center produces, stores, and distributes clinical data and biomaterials (DNA samples and cell lines) available in the NIMH Human Genetics Initiative. The Center creates and distributes computational tools that support investigation and analysis of the clinical data. In addition, the Center creates tools that enables researchers to determine which samples or data might be of use to them, so that they may request access from NIMH.

Learn more

Funding Agency: NIH

SimCenter: Center for Computational Modeling and Simulation

The SimCenter will provide modeling and simulation tools using a new open-source framework that: (1) addresses various natural hazards, such as windstorms, storm surge, tsunamis, and earthquakes; (2) tackles complex, scientific questions of concern to disciplines involved in natural hazards research, including earth sciences, geotechnical and structural engineering, architecture, urban planning, risk management, social sciences, public policy, and finance; (3) utilizes machine learning to facilitate and improve modeling and simulation using data obtained from experimental tests, field investigations, and previous simulations; (4) quantifies uncertainties associated with the simulation results obtained; (5) utilizes the high-performance parallel computing, dXata assimilation, and related capabilities to easily combine software applications into workflows of unprecedented sophistication and complexity; (6) extends and refines software tools for carrying out performance-based engineering evaluations and supporting decisions that enhance the resilience of communities susceptible to multiple natural hazards; and (7) utilizes existing applications that already provide many of the pieces of desired computational workflows.

Learn more

Funding Agency: NSF

XSEDE: Integrating, Enabling and Enhancing National Cyberinfrastructure with Expanding Community Involvement

Scientists, engineers, social scientists, and humanists around the world – many of them at colleges and universities – use advanced digital resources and services every day. Things like supercomputers, collections of data, and new tools are critical to the success of those researchers, who use them to make our lives healthier, safer, and better. XSEDE is an NSF-funded virtual organization that integrates and coordinates the sharing of advanced digital services – including supercomputers and high-end visualization and data analysis resources – with researchers nationally to support science. Digital services provide users with seamless integration to NSF’s high-performance computing and data resources. XSEDE’s integrated, comprehensive suite of advanced digital services combined with other high-end facilities and campus-based resources, serve as the foundation for a national cyberinfrastructure ecosystem.

Learn more

Funding Agency: NSF

Open Science Grid

The OSG provides common service and support for resource providers and scientific institutions using a distributed fabric of high throughput computational services. The OSG does not own resources but provides software and services to users and resource providers alike to enable the opportunistic usage and sharing of resources. The OSG is jointly funded by the Department of Energy and the National Science Foundation. The OSG is primarily used as a high-throughput grid where scientific problems are solved by breaking them down into a very large number of individual jobs that can run independently.

Learn more

Funding Agency: NSF and DOE

logo-vertical

WRENCH: Workflow Management System Simulation Workbench

Capitalizing on recent advances in distributed application and platform simulation technology, WRENCH makes it possible to (1) quickly prototype workflow, WMS implementations, and decision-making algorithms; and (2) evaluate/compare alternative options scalably and accurately for arbitrary, and often hypothetical, experimental scenarios.  This project will define a generic and foundational software architecture, that is informed by current state-of-the-art WMS designs and planned future designs.  The implementation of the components in this architecture when taken together form a generic “scientific instrument” that can be used by workflow users, developers, and researchers.  This scientific instrument will be instantiated for several real-world WMSs and used for a range of real-world workflow applications.

Learn more Source Code

Funding Agency: NSF

IRIS: Integrity Introspection For Scientific Workflows

Data-driven science applications often depend on the integrity of the underlying scientific computational workflow and on the integrity of the associated data products. However, experience with executing numerous scientific workflows in a variety of environments has shown that workflow processing suffers from data integrity errors when workflows are executing on national cyberinfrastructure (CI). These errors can stem from failures and unintentional corruption at various layers of the system software and hardware. However, today, there is a lack of tools that can collect and analyze integrity-relevant data while workflows are executing and thus, many of these errors go undetected and corrupt data becomes part of the scientific record. The goal of this proposed work is to automatically detect, diagnose, and pinpoint the source of unintentional integrity anomalies in scientific workflows executing on distributed CI. The approach is to develop an appropriate threat model and incorporate it in an integrity introspection, correlation and analysis framework that collects application and infrastructure data and uses statistical and machine learning (ML) algorithms to perform the needed analysis. The framework will be powered by novel ML-based methods developed through experimentation in a controlled testbed and validated in and made broadly available on NSF production CI. The solutions will leverage and be integrated into the Pegasus workflow management system, which is already used by a wide variety of scientific domains. An important part of the project will be the engagement with selected science application partners in gravitational-wave physics, earthquake science, and bioinformatics to deploy the analysis framework for their workflows, and iteratively fine tune the threat models, the testbed, ML model training, and ML model validation in a feedback loop.

Funding Agency: NSF

CICoE: NSF Pilot Study – Cyberinfrastructure Center of Excellence

NSF’s major multi-user research facilities (large facilities) are sophisticated research instruments and platforms – such as large telescopes, interferometers and distributed sensor arrays – that serve diverse scientific disciplines from astronomy and physics to geoscience and biological science. Large facilities are increasingly dependent on advanced cyberinfrastructure (CI) – computing, data and software systems, networking, and associated human capital – to enable broad delivery and analysis of facility-generated data. As a result of these cyber infrastructure tools, scientists and the public gain new insights into fundamental questions about the structure and history of the universe, the world we live in today, and how our plants and animals may change in the coming decades. The goal of this pilot project is to develop a model for a Cyberinfrastructure Center of Excellence (CI CoE) that facilitates community building and sharing and applies knowledge of best practices and innovative solutions for facility CI.

The pilot project will explore how such a center would facilitate CI improvements for existing facilities and for the design of new facilities that exploit advanced CI architecture designs and leverage establish tools and solutions. The pilot project will also catalyze a key function of an eventual CI CoE – to provide a forum for exchange of experience and knowledge among CI experts. The project will also gather best practices for large facilities, with the aim of enhancing individual facility CI efforts in the broader CI context. The discussion forum and planning effort for a future CI CoE will also address training and workforce development by expanding the pool of skilled facility CI experts and forging career paths for CI professionals. The result of this work will be a strategic plan for a CI CoE that will be evaluated and refined through community interactions: workshops and direct engagement with the facilities and the broader CI community.

Learn more Source Code

Funding Agency: NSF

8,077 views