Large-scale scientific instruments generate torrents of raw data
Supercomputers separate waste from value and translate numbers into insight
Results point to innovative answers to humanity’s biggest questions
Today high-performance computing is at the forefront of a new gold rush, a rush to discovery using an ever-growing flood of information and data. Computing is now essential to science discovery like never before. We are the modern pioneers pushing the bounds of science for the betterment of society. ~Bernd Mohr, Jülich Supercomputing Centre
In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a mother lode of precious data.
With names seemingly created for a techno-speak glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in seemingly incomprehensible quantities—billions, trillions, and quadrillions of bits and bytes of electro-magnetic code.
Yet, policy-makers from the National Science Foundation (NSF) and elsewhere believe that hidden within these mountain-sized mines of information are clues to questions that have long confounded humanity: answers about those bits of glitter in the night sky, the nature of matter, the causes of disease, the origins of life, and even why and how we think about such things.
For this reason, the ability to convert this seemingly unintelligible digital data into rapid, meaningful discoveries has taken on added significance. Indeed, one of the NSF’s 10 Big Ideas for the future includes “Harnessing Data for the 21st Century Science and Engineering”.
Enter high-performance computing (HPC) which sifts and separates waste from valuable digital nuggets and, somewhat like a Rosetta Stone of the information age, decodes and translates this data into valuable insight.
“Advanced computing, along with experts charged with building and making the most of these HPC systems, has been critical to many Nobel Prizes, including work involving traditional modeling and simulation, to projects designed for more data-intensive workloads,” says Michael Norman, director of the San Diego Supercomputer Center (SDSC) at UC San Diego.
As evidence, Norman and others point to several recent Nobel Prizes in chemistry and physics—including international collaborations exploring the dark side of the universe and others delving into the dynamics of proteins critical for tomorrow’s targeted therapies.
Each has relied on the marriage of supercomputing technology and expertise with large-scale scientific instruments, all connected by high-speed communications networks. And each touches on other Big Ideas from the NSF, such as “The Era of Multi-Messenger Astrophysics” that includes a collection of approaches to expand our observations and understandings of the universe; and “Understanding the Rules of Life,” an initiative that will require convergence of research across biology, computer science, mathematics, behavioral sciences, and engineering.
Some of this effort is based on the solution of fundamental mathematical equations to create models or simulations using HPC systems now capable of generating quadrillions of calculations per second, such as Comet, funded by the NSF and housed at SDSC.
Other HPC research requires the access, analysis, and interpretation of previously unfathomable amounts of data via a modality called high-throughput computing (HTC) being generated from a wide cross-section of sensors and detectors. Simulation and data analysis along with experimentation sometimes complement and even blend with one another for discovery.
“HTC is a way of consuming computer resources, including those we label as HPC,” says Frank Würthwein, UC San Diego physics professor and Distributed High-Throughput Computing Lead at SDSC. “The way these large-scale instruments do analysis requires the HTC ‘modality’ of computing. This is distinct from the standard ‘submit a job to the queue’ which is what people traditionally do for simulations.”
An integrated data ecosystem
Those on the technological front line recognize that the challenges to keep up with the data explosion are enormous. Among other things, much of the science requires the integration of computational resources in an ecosystem that includes sophisticated workflow tools to orchestrate complex pathways for scheduling, data transfer, and processing.
Massive sets of data collected through these efforts also require tools and techniques for filtering and processing, plus analytical techniques to extract key information. Moreover, the system needs to be effectively automated across different types of resources, including instruments and data archives.
Some suggest that all these components should be orchestrated into what’s being called a “super facility”. The goal, according to the US Department of Energy (DOE), is to bring together users at multiple institutions “allowing geographically dispersed collaborators to tap into scientific resources and expertise, and analyze and share data with other users—all in real time and without having to leave the comfort of their office or lab.”
Says Würthwein: “These large-scale scientific instruments depend on large international cyberinfrastructures that a ‘super facility’ must integrate into seamlessly. The HPC system cannot be an island unto itself.”
According to the NSF, “The grand challenges of today—protecting human health, understanding the food, energy, water nexus; exploring the universe on all scales—will not be solved by one discipline alone. They require convergence: the merging of ideas, approaches, and technologies from widely diverse fields of knowledge to stimulate innovation and discovery.”
Armed with ever-more powerful large-scale scientific instruments, research teams around the globe—some encompassing a wide variety of disciplines—are converging to build an impressive portfolio of scientific advances and discoveries, with supercomputers serving as a critical linchpin for all these investigations.
Shining light on black holes
On July 4, 2012, at the CERN laboratory for particle physics, a theory first proposed in 1964 was confirmed with the discovery of a Higgs particle. The theory, which garnered the 2013 Nobel Prize in physics, helps describe how the world is constructed at its most fundamental level, from the intense waves of energy and primordial particles released from the Big Bang, to the planet we inhabit, to those glittering specks of light we observe in the night sky.
Under a partnership with UC San Diego physicists and the Open Science Grid (OSG), SDSC’s Gordon supercomputer provided auxiliary computing capacity to process raw data generated by the Compact Muon Solenoid (CMS)—one of two general purpose particle detectors at the Large Hadron Collider (LHC). LHC experiments are among the largest ever seen in physics, with each experiment involving collaborations of close to 200 institutions in more than 40 countries, and more than 3,000 scientists and engineers.
“Access to Gordon, and its excellent computing speed due to its flash-based memory, really helped push forward the processing schedule for us,” says Würthwein, a member of the CMS project and executive director of OSG.
“This was one of the first ever integrations of HTC with a large HPC system and with only a few weeks’ notice, we were able to gain access to Gordon and complete the runs, making the data available for analysis in time to provide crucial input toward the international planning meetings on the future of particle physics.”
In February 2016, an international team representing more than 20 countries announced the first-ever detection of gravitational waves in the universe, based on the tell-tale “chirp” signature of two black holes merging about 1.3 billion years ago.
The signal was detected on earth, first by the Laser Interferometer Gravitational Wave Observatory (LIGO) near Livingston, Louisiana; and then seven milliseconds later and 1,890 miles away, at the second LIGO interferometer in Hanford, Washington. Three members of the team won the 2017 Nobel Prize in Physics for the discovery.
“LIGO’s discovery of gravitational waves from the binary black hole required large-scale data analysis to validate the discovery claim,” says Duncan Brown, Charles Brightman Professor of Physics at Syracuse University.
“This includes measuring how significant the signal is compared to noise in the detector, and re-analyzing the data with simulated signals to ensure that we understand the astrophysical sensitivity of the search. Comet’s computer cycles were extremely important for us to complete large-scale simulations and fast validation of the search.”
Less than a year after the first discovery of gravitational waves, in October 2017 researchers announced they had detected gravitational waves generated by the collision of two neutron stars more than 130 light years from earth, via the two LIGO instruments and the Europe-based Virgo interferometer, followed shortly by multiple telescopes and satellites built to capture light from the universe.
This combination of observational instruments bears testimony to what’s become known as multi-messenger astronomy (MMA), where multiple instruments—built to detect different forms of electromagnetic radiation – are choreographed with one another, essentially in real time, to view the same patch of sky. Once again, Comet helped verify the signal, with allocations from NSF’s Extreme Science and Engineering Discovery Environment (XSEDE) and the OSG.
“The correlation of the three interferometers, 2 from LIGO and one from Virgo significantly shrunk the area in the sky for where to look,” says Würthwein.
The hunt for neutrinos
Since being postulated in December 1930 by Wolfgang Pauli, cosmologists have been hunting for neutrinos: subatomic particles that lack an electric charge, particles once described as “the most tiny quantity of reality ever imagined by a human being.”
For the most part, cosmic neutrinos are believed to have been created about 14 billion years ago, soon after the birth of the universe. Others emerged more recently from some of the most violent actions in the universe, such as exploding stars, gamma ray bursts, black holes and neutron stars. But unlike photons and other charged particles, neutrinos can emerge from their sources and, like cosmological ghosts, pass through the universe unscathed.
To help catch these messengers from deep space, an international team of researchers set up IceCube, a neutrino observatory containing an array of 5,160 optical sensors deep within a cubic kilometer of ice at the South Pole.
Frank Halzen, principal investigator of the IceCube Observatory and physics professor at the University of Wisconsin-Madison, explained the importance of supercomputers for isolating the signature pattern of neutrinos: “The IceCube neutrino detector transforms natural Antarctic ice at the South Pole into a particle detector. Progress in understanding the precise optical properties of the ice leads to increasing complexity in simulating the propagation of photons in the instrument and to a better overall performance of the detector.”
“The photon propagation in the ice is very well-suited to run in graphics processing units (GPUs) hardware, such as those on Comet.” Halzen continues. “Pursuing efficient access to a large amount of GPU computing power is therefore of great importance to ensure that future IceCube analysis reaches the maximum precision and that the full scientific potential of the instrument is exploited.”
Read Part 2 here.