- Computing speeds the research that helps scientists understand the rules of life
- Machine learning could help paralyzed patients control prosthetic robotic limbs
- As the scale and complexity of research increases, HPC systems and expertise become more important than ever
Yesterday, in Part 1, we explored how large-scale scientific instruments combined with high-performance computing (HPC) are tackling several of humankind’s ‘grand challenges’, in particular advancing cosmology, particle physics, and our understanding of the universe.
Today we look at health and medicine, artificial brains, and beyond.
During the past few decades, the life sciences have witnessed one landmark discovery after another, with HPC paving the way toward a new era of personalized treatments based on an individual’s genetic makeup, and drugs capable of attacking previously intractable ailments with few side effects.
Genomics research is generating torrents of biological data to help “understand the rules of life” for personalized treatments believed to be the focus for tomorrow’s medicine. The sequencing of DNA has rapidly moved from the analysis of data sets that were megabytes (millions) in size to entire genomes that are gigabytes (billions) in size.
In one recent genome analysis, an international team led by Jonathan Sebat, professor of psychiatry, cellular and molecular medicine and pediatrics at UC San Diego School of Medicine, identified a risk factor that may explain some of the genetic causes for autism: rare inherited variants in regions of non-code DNA.
For about a decade, researchers knew that the genetic cause of autism partly consisted of gene mutations that appear for the first time. But those sequences represented only 2 percent of the genome. To investigate the remaining 98 percent of the genome in ASD (autism spectrum disorder), Sebat and colleagues analyzed the complete genomes of 9,274 subjects from 2,600 families, representing a combined data total on the range of terabytes (trillions of bytes).
“Whole genome sequencing data processing and analysis are both computationally and resource intensive,” says Madhusudan Gujral, co-author of the study. “Using Comet, processing and identifying specific structural variants from a single genome took about 2 ½ days.”
Not long ago, the following might have been considered an act of wizardry from a Harry Potter novel. First, take a speck of biomolecular matter, invisible to the naked eye, and then deep-freeze it to near absolute zero. Then, blast this material with an electron beam. Finally, add the power of a supercomputer.
And, presto! A three-dimensional image of the original biological speck appears on a computer monitor at atomic resolution. This innovation—given the name of cryo-electron microscopy or simply cryo-EM—garnered the 2017 Nobel Prize in chemistry for the technology’s invention in the 1970s.
Today, researchers seeking to unravel the structure of proteins in atomic detail, in hopes of treating many intractable diseases, are increasingly turning to cryo-EM as an alternative to time-tested X-ray crystallography.
“About 10 years ago, cryo-EM was known as blob-biology,” says Robert Sinkovits, director of scientific computing applications at SDSC. ”You got an overall shape, but not at the resolution you would get with X-ray crystallography, which required working with a crystal. But it was kind of a black art to create these crystals and some things simply wouldn’t crystalize. You can use cryo-EM for just about anything.”
Several molecular biologists and chemists are taking advantage of UC San Diego’s cryo-EM laboratory and SDSC’s computing resources, to reveal the inner workings of targeted proteins critical to the understanding of diseases such as fragile X syndrome and childhood liver cancer.
Machine learning and brain implants
Machine learning is a concept that can boggle the brain, and ironically is now being used to imitate that very organ. This innovation typically involves ‘training’ a computer or robot on millions of actions so that the computer learns how to derive insight and meaning from the data as time advances.
Recently, a collaborative team led by researchers at SDSC and the Downstate Medical Center in Brooklyn, NY applied a novel computer algorithm to mimic how the brain learns. The goal: to identify and replicate neural circuitry that resembles the way an unimpaired brain controls limb movement.
The study laid the groundwork to develop realistic brain implants that replicate brain circuits and function – that one day could replace lost or damaged brain cells from tumors, stroke or other diseases.
The researchers trained their model using spike-timing dependent plasticity (STDP) and reinforced learning, believed to be the basis for memory and learning in mammalian brains. Briefly, the process refers to the ability of synaptic connections to become stronger based on when they are activated in relation to each other, meshed with a system of biochemical rewards or punishments that are tied to correct or incorrect decisions.
“Only the fittest individual (models) remain, those models that are better able to learn better, survive and propagate their genes,” says Salvador Dura-Bernal, assistant professor in physiology and pharmacology with Downstate.
As for the role of HPC in this study: “Since thousands of parameter combinations need to be evaluated, this is only possible by running the simulations using HPC resources such as those provided by SDSC,” says Dura-Bernal. “We estimated that using a single processor instead of the Comet system would have taken almost six years to obtain the same results.”
Into the future and beyond
Other impressive data producers are waiting in the wings to pose further challenges for tomorrow’s super facilities.
An ambitious upgrade to the Large Hadron Collider (LHC) will result in a substantial increase in the intensity of proton beam collisions. From the mid-2020s forward, the experiments at the LHC are expected to yield 10 times more data each year than the combined output of data generated during the three-years leading up to the Higgs discovery.
Beyond that, future accelerators are being discussed that would be housed in 100-km long tunnels to reach collision energies many times that of the LHC, while still others are suggesting the construction of colliders based on different geometric shapes, perhaps linear rather than ring. More powerful machines, by definition, will translate into torrents more data to digest and analyze.
Thanks to an agreement with the Simons Foundation Flatiron Institute, SDSC’s Gordon supercomputer is being re-purposed to provide computational support for the POLARBEAR and successor project called the Simons Array. The projects—led by UC Berkeley and funded by the Simons Foundation and the NSF—will deploy the most powerful cosmic microwave background (CMB) radiation telescope and detector ever made to detect what are, in essence, the leftover ‘heat’ from the Big Bang in the form of microwave radiation.
“The POLARBEAR experiment alone collects nearly one gigabyte of data every day that must be analyzed in real time,” says Brian Keating, professor of physics at UC San Diego’s Center for Astrophysics & Space Sciences.
“This is an intensive process that requires dozens of sophisticated tests to assure the quality of the data. Only by leveraging resources such as Gordon are we able to continue our legacy of success.”
“As the scale of data and complexity of these experimental projects increase, it is more important than ever before that centers like SDSC respond by providing HPC systems and expertise that become part of the integrated ecosystem of research and discovery,” says Norman.