Stanford Report - December 3rd, 2008 - by Bruce Goldman
Scientists worldwide may benefit from a powerful new database, available free online, that will help them to home in on the parts of proteins most necessary for their function. Arend Sidow, PhD, associate professor of pathology and of genetics, recently launched the novel bioinformatics tool, which enlists evolution as the guide to determining the role different proteins play in an array of organisms.
ProPhylER (http://www.prophyler.org) will enable a researcher studying a protein, or the gene coding for it, to more easily figure out how it works and what might go wrong if the gene has a mutation. "Whether you're a cell biologist, biochemist or structural biologist, ProPhylER produces instant working hypotheses for you as to where the protein's functional areas are," he said. The site made its debut on Oct. 10.
Proteins—the machines of life that do everything from making your muscles move to helping you think—are long chains of chemical units called amino acids. As soon as a protein molecule is made inside a cell from the gene encoding it, it folds up to assume the unique shape that determines its activity. To do its job, a protein needs to have specific amino acids (there are 20 to pick from) in specific places. In particular regions of the folded protein, it may be crucial that a specific amino acid sequence be there for the protein to function; in other regions, the swapping of one amino acid for another has little effect.
Over hundreds of millions of years myriad species have evolved and, through eons of random mutation, so have their proteins. Yet in the face of all these changes, some things have to remain constant. "Evolution imposes stronger constraints on more-important regions of a protein molecule, from the standpoint of its biological activity, than on other, less-critical regions of that protein," Sidow said. If a change interferes with a protein's function, the hapless creature harboring this variant dies out; if not, the creature is fruitful and multiplies, and the variant protein persists in modern species.
It's by no means obvious just from looking at a protein's linear amino-acid sequence which regions are the "business districts" of the protein and which are the sleepier bedroom communities. By comparing numerous versions of the same protein from different species, ProPhylER identifies the parts of a protein that are key to its activity. This is especially useful for proteins about which little is known, which is the majority of proteins in the human genome.
Human geneticists, too, will benefit from ProPhylER (a play on words derived from the bulkier term "PROtein PHYLogenetics and Evolutionary Rates"). Each of us carries tens of thousands of protein variants (due to mutations that have persisted in the human gene pool), some fraction of which affect protein function. For researchers, it is notoriously difficult to measure experimentally how much a protein's function is impaired by a mutation. ProPhylER provides specific predictions, also based on evolutionary variation, of the impact of a mutation on the protein's function. A mutation in an amino acid that has changed a lot in evolution is much less likely to be bad than a mutation that affects an amino acid in which evolution has not tolerated any change.
The site displays data via two interfaces after a user has searched for a particular protein. The first displays evolutionary data graphically along the length of the protein. The second, called "Crystal Painter," projects these degrees of evolutionary constraint onto three-dimensional structures of proteins, when those are known, by imposing a color-coded scheme on their structures. It then reveals at a glance the parts of proteins obviously important to function.
"No other proteomics resource does this," Sidow said.
The NIH's National Human Genome Research Institute provided funding for this project.