Stanford Report, May 22nd, 2014, by Bjorn Carey
Understanding the origins of our solar system, the future of our planet or humanity requires complex calculations run on high-power computers.
A common thread among research efforts across Stanford’s many disciplines is the growing use of sophisticated algorithms, run by brute computing power, to solve big questions.
In Earth sciences, computer models of climate change or carbon sequestration help drive policy decisions, and in medicine computation is helping unravel the complex relationship between our DNA and disease risk. Even in the social sciences, computation is being used to identify relationships between social networks and behaviors, work that could influence educational programs.
“There’s really very little research that isn’t dependent on computing,” says Ann Arvin, vice provost and dean of research. Arvin helped support the recently opened Stanford Research Computing Center (SRCC) located at SLAC National Accelerator Laboratory, which expands the available research computing space at Stanford. The building’s green technology also reduces the energy used to cool the servers, lowering the environmental costs of carrying out research.
“Everyone we’re hiring is computational, and not at a trivial level,” says Stanford Provost John Etchemendy, who provided an initial set of servers at the facility. “It is time that we have this facility to support those faculty.”
Here are just a few examples of how Stanford faculty are putting computers to work to crack the mysteries of our origins, our planet and ourselves.
Q: How did we evolve?
The human genome is essentially a gigantic data set. Deep within each person’s six billion data points are minute variations that tell the story of human evolution, and provide clues to how scientists can combat modern-day diseases.
To better understand the causes and consequences of these genetic variations, Jonathan Pritchard, a professor of genetics and of biology, writes computer programs that can investigate those links. “Genetic variation affects how cells work, both in healthy variation and in response to disease,” Pritchard says. How that variation displays itself – in appearance or how cells work – and whether natural selection favors those changes within a population drives evolution.
Consider, for example, variation in the gene that codes for lactase, an enzyme that allows mammals to digest milk. Most mammals turn off the lactase gene after they’ve been weaned from their mother’s milk. In populations that have historically revolved around dairy farming, however, Pritchard’s algorithms have helped to elucidate signals of strong selection since the advent of agriculture to enable people to process milk active throughout life. There has been similarly strong selection on skin pigmentation in non-Africans that allow better synthesis of vitamin D in regions where people are exposed to less sunlight.
The algorithms and machine learning methods Pritchard used have the potential to yield powerful medical insights. Studying variations in how genes are regulated within a population could reveal how and where particular proteins bind to DNA, or which genes are turned on in different cell types – information that could help design novel therapies. These inquiries can generate hundreds of thousands of data sets and can only be parsed with up to tens of thousands of hours of computer work.
Pritchard is bracing for an even bigger explosion of data; as genome sequencing technologies become less expensive, he expects the number of individually sequenced genomes to jump by as much as a hundredfold in the next few years. “Storing and analyzing vast amounts of data is a fundamental challenge that all genomics groups are dealing with,” says Pritchard, who is a member of Stanford Bio-X. “Having access to SRCC will make our inquiries go easier and more quickly, and we can move on faster to making the next discovery.” —Bjorn Carey
Q: How does our DNA make us who we are?
Our DNA is sometimes referred to as our body’s blueprint, but it’s really more of a sketch. Sure, it determines a lot of things, but so do the viruses and bacteria swarming our bodies, our encounters with environmental chemicals that lodge in our tissues and the chemical stew that ensues when our immune system responds to disease states.
All of this taken together – our DNA, the chemicals, the antibodies coursing through our veins and so much more – determines our physical state at any point in time. And all that information makes for a lot of data if, like genetics professor Michael Snyder, you collected it 75 times over the course of four years.
Snyder is a proponent of what he calls “personal omics profiling,” or the study of all that makes up our person, and he’s starting with himself. “What we’re collecting is a detailed molecular portrait of a person throughout time,” he says.
So far, he’s turning out to be a pretty interesting test case. In one round of assessment he learned that he was becoming diabetic and was able to control the condition long before it would have been detected through a periodic medical exam.
If personal omics profiling is going to go mainstream, serious computing will be required to tease out which of the myriad tests Snyder’s team currently runs give meaningful information and should be part of routine screening. Snyder’s sampling alone has already generated a half of a petabyte of data – roughly enough raw information to fill about a dishwasher-size rack of servers.
Right now, that data and the computer power required to understand it reside on campus, but new servers will be located at SRCC. “I think you are going to see a lot more projects like this,” says Snyder, who is also a Stanford Bio-X affiliate and a member of the Stanford Cancer Center. “Computing is becoming increasingly important in medicine.” —Amy Adams
Q: How do we learn to read?
A love letter, with all of its associated emotions, conveys its message with the same set of squiggly letters as a newspaper, novel or an instruction manual. How our brains learn to interpret a series of lines and curves into language that carries meaning or imparts knowledge is something psychology Professor Brian Wandell has been trying to understand.
Wandell hopes to tease out differences between the brain scans of kids learning to read normally and those who are struggling, and use that information to find the right support for kids who need help. “As we acquire information about the outcome of different reading interventions we can go back to our database to understand whether there is some particular profile in the child that works better with intervention 1, and a second profile that works better with intervention 2,” says Wandell, a Stanford Bio-X member who is also the Isaac and Madeline Stein Family Professor and professor, by courtesy, of electrical engineering.
His team developed a way of scanning kids’ brains with magnetic resonance imaging, then knitting the million collected samples together with complex algorithms that reveal how the nerve fibers connect different parts of the brain. “If you try to do this on your laptop, it will take half a day or more for each child,” he says. Instead, he uses powerful computers to reveal specific brain changes as kids learn to read.
Wandell is associate director of the Stanford Neurosciences Institute, where he is leading the effort to develop a computing strategy – one that involves making use of SRCC rather than including computing space in their planned new building. He says one advantage of having faculty share computing space and systems is to speed scientific progress. “Our hope for the new facility is that it gives us the chance to set the standards for a better environment for sharing computations and data, spreading knowledge rapidly through the community,” he says. —Amy Adams
Q: What can computers tell us about how our body works?
As you sip your morning cup of coffee, the caffeine makes its way to your cells, slots into a receptor site on the cells’ surface and triggers a series of reactions that jolt you awake. A similar process takes place when Zantac provides relief for stomach ulcers, or when chemical signals produced in the brain travel cell-to-cell through your nervous system to your heart, telling it to beat.
In each of these instances, a drug or natural chemical is activating a cell’s G-protein coupled receptor (GPCR), the cellular target of roughly half of all known drugs, says Vijay Pande, a professor of chemistry and, by courtesy, of structural biology and of computer science at Stanford. This exchange is a complex one, though. In order for caffeine or any other molecule to influence a cell, it must fit snugly into the receptor site, which consists of 4,000 atoms and transforms between an active and inactive configuration. Current imaging technologies are unable to view that transformation, so Pande has been simulating it using his Folding@Home distributed computer network.
So far, Pande’s group has demonstrated a few hundred microseconds of the receptor’s transformation. Although that’s an extraordinarily long chunk of time compared to similar techniques, Pande is looking forward to accessing the SRCC to investigate the basic biophysics of GPCR and other proteins. Greater computing power, he says, will allow his team to simulate larger molecules in greater detail, simulate folding sequences for longer periods of time and visualize multiple molecules as they interact. It might even lead to atom-level simulations of processes at the scale of an entire cell. All of this knowledge could be applied to computationally design novel drugs and therapies.
“Having more computer power can dramatically change every aspect of what we can do in my lab,” says Pande, who is also a Stanford Bio-X affiliate. “Much like having more powerful rockets could radically change NASA, access to greater computing power will let us go way beyond where we can go routinely today.