Researchers representing four labs across two colleges at Notre Dame have received a four-year, $1.1 million Research Project Grant (R01) from the National Institute of General Medical Sciences at the National Institutes of Health (NIH). The oldest grant mechanism used by the NIH, the R01 provides support for health-related research and development based on the mission of the NIH.
Principal Investigators of this grant include two from the College of Science: Jun Li, Ph.D. of the department of Applied and Computational Mathematics and Statistics (ACMS) and Patricia Clark, Ph.D. of the department of Chemistry and Biochemistry; and two from the College of Engineering: Scott Emrich, Ph.D. and Tijana Milenkovic, Ph.D. both of the department of Computer Science and Engineering.
The awarded project, titled “Integrative Computational Framework for Pattern Mining in Big -omics Data: Linking Synonymous Codon Usage to Protein Biogenesis,” expands upon a line of inquiry started several years ago by Clark and Emrich, who sought to develop a computational approach to test the hypothesis that small changes to the rate of protein synthesis could change the folding of the encoded protein.
Efficient production of proteins is arguably the most important function of a cell. In the cell, proteins are synthesized as linear polymers, but must fold into a three-dimensional shape in order to function. Sometimes, the rate of folding can be faster than the rate of synthesis, which can lead to diseases like Alzheimer’s disease, cystic fibrosis, diabetes, liver diseases, blood clotting or bleeding disorders, or even cancer, among other diseases.
Early during the development of this project, Clark and Emrich began collaborating with Li and Milenkovic: Li for his expertise in statistical modeling and mining on big data and Milenkovic for her expertise in network science. In summary, what the researchers found was that clusters of slow codons do tend to co-occur at similar positions in genes encoding proteins with similar structures, and this co-occurrence is both widespread and statistically significant. But why certain genes have these co-occurring clusters was much harder to figure out, and forms the basis for the new R01 award.
Emrich and Milenkovic are developing new sequence and network analysis methods, respectively, made possible by Li's novel statistical framework for efficient mining of interesting patterns in large-scale genomic/proteomic sequence and structural network data. Critical to the validity of computational analysis, and to make the experimental validation cost-efficient, Li will also develop rigorous statistical tests to rule out possible false positive discoveries and recommend the most promising patterns and hypotheses for further experimental validation. Clark’s lab is working on the computational analyses with other investigators and developing experiments in the lab to construct gene sequences with different patterns of synonymous codon usage and test hypotheses developed from their computational and statistical approaches. Results from the experimental studies will be used to construct an iterative feedback loop between the computational/statistical results and hypotheses regarding the specific roles for rare codon clusters in efficient protein production.