- NSF adds cloud-based interface to the US cyberinfrastructure.
- On-demand HPC access given to more scientists than ever before.
- Virtualization enhances scientific reproducibility.
Forming a boundary between two large air masses, a jet stream is a rapid, high-altitude current that diffuses hot and cold air, guiding the winds that affect our daily weather below.
This is the analogy behind Jetstream, a self-service cloud-based interface to US National Science Foundation (NSF) high-performance computing resources. Like its namesake, Jetstream will occupy the border between current and new users of NSF supercomputers.
“We're particularly focused on the ‘long tail of science’, the very large amount of data collected by lots of scientists doing field research, lab research, and scientific experiments,” says Craig Stewart, Jetstream principal investigator. “We aim to accelerate their research and improve their research education activities by several thousand people.”
Funded by the NSF, Jetstream results from a partnership between Indiana University’s Pervasive Technology Institute, the University of Texas at Austin’s Texas Advanced Computing Center (TACC), the Computation Institute at the University of Chicago, the iPlant Collaborative at the University of Arizona, and the University of Texas, San Antonio.
To broaden access to HPC resources, Jetstream will run much like a traditional cloud facility, geographically distributed across the US and available 24/7/365. Web-based accessibility means researchers who have graduated to the need for supercomputing but lack technical sophistication find an easy route to harness the massive compute power scientific discovery now demands.
Managed through the XSEDE Resource Allocation System, the computing environment will consist of one cluster at Indiana University and one cluster at TACC with a test environment at the University of Arizona. The system will provide over a half petaFLOPS of computational capacity and 2 petabytes of block and object storage. The individual nodes will contain two Intel ‘Haswell’ processors, 128 GB of RAM, 2 terabytes of local storage, and 10 gigabit Ethernet networking. The system will leverage 40 gigabit Ethernet for network aggregation and each of the production clusters will connect to Internet2 at 100 Gbps. Geographic distribution allows redundancy and resilience if one of the sites becomes inoperable.
“The most important part of Jetstream is its usability,” says Matt Vaughn, Jetstream co-principal investigator and director of Life Sciences Computing at TACC. “Researchers will be able to log in to the system, choose a virtual machine (VM) image that has the software and the environment they need, launch that virtual machine and be up and productive, analyzing and interpreting their data in a matter of minutes.”
Jetstream’s VM ability will also allow for the creation of an analyses archive, enabling a level of reproducibility previously underdeveloped in HPC-powered science. “With Jetstream, now you can capture the exact environment necessary and reproduce the analysis that may support a publication,” says James Taylor, Ralph O’Connor associate professor of biology and computer science at John Hopkins University. “We’ll be able to give a digital object identification (DOI) number to VMs so that they can become citable. This will take reproducibility of computationally intensive science to a whole new level.”
A new wind is streaming across the US scientific landscape. As the cloud-based component in the US cyberinfrastructure, Jetstream promises to democratize access to NSF-managed compute resources. By bringing supercomputing to more scientists than ever, the NSF has accelerated the pace of discovery.