June 11, 2021 – Scientists from the University of California Irvine recently used the National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE) grants for Comet at the San Diego Supercomputer Center at UC San Diego and Bridges at the Pittsburgh Supercomputing Center to to gain a better understanding of contributions from maternal and paternal lineages in genomic sequences.
“Although genome sequencing has become a fundamental goal and tool in science, the problem has been that many genomes have been difficult to fully resolve by sequencing because they contain different contributions from maternal and paternal lineages,” said Brandon Gaut, one Ecology and evolutionary biologist professor at UC Irvine. “Our work used optimization methods from computer science to elucidate the accuracy of the separation of maternal and parental DNA in genomes.”
Why it matters
This novel research, detailed in a January 2021 BMC Bioinformatics journal article, not only leads to improvements in genome completeness, but also helps scientists better understand the genetic relationships between individuals, populations and species. This, in turn, can lead to improvements in medicine and food production for different population groups.
“Comet and Bridges were powerful enough to perform our new method for separating and optimizing genome sequence haplotypes called HapSolo,” said NSF Graduate Student Fellow Edwin Solares, first author of the journal article and also funded by the UC President’s Pre-Professoriate Fellowship. “With the help of the XSEDE mappings on supercomputers, we were able to demonstrate the performance of HapSolo on genome data of three species: the Chardonnay grape, a mosquito and the thorn ray.”
How XSEDE helped
Genomes of the Chardonnay grape (Vitis vinifera), the mosquito (Anopheles funestus; 200 Mb) and the thorn ray (Amblyraja radiata; 2650 Mb) were sequenced with several supercomputers. Image Credit: UC Irvine.
Solares explained that Comet and Bridges performed calculations on the genome of the Chardonnay grape (Vitis vinifera) with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the thorn ray (Amblyraja radiata; 2650 Mb). “Using supercomputers for these analyzes has cut our run times in half on several of our samples,” he said. “By using XSEDE resources, we were able to focus on science – rather than on computational problems that are certain to arise without the use of supercomputers like Comet and Bridges.”
Solares is supported by an NSF Graduate Research Program Fellowship Grant (DGE-1321846) that helped his time formulate and conduct the study. Additional support came from the NSF (Grant No. 1741627), NIH (Grant No. R01OD010974 and R01GM115562), and XSEDE Awards (ACI-1548562, ACI-1445606, and TG-MCB180035).
The San Diego Supercomputer Center (SDSC) is a leader and pioneer in high-performance and data-intensive computing, providing cyber infrastructure resources, services, and expertise to the national research community, academia, and industry. Located on the UC San Diego campus, the SDSC supports hundreds of multidisciplinary programs in a variety of fields, from astrophysics and earth sciences to disease and drug discovery. In December 2020, Expanse, the newest National Science Foundation-funded supercomputer, went into production. With more than twice the performance of Comet, Expanse supports the SDSC topic “Computing without Boundaries” with a data-centric architecture, public cloud integration and state-of-the-art GPUs for the integration of experimental facilities and edge computing.
The Pittsburgh Supercomputing Center (PSC) is a joint data center of Carnegie Mellon University and the University of Pittsburgh. Founded in 1986 and supported by multiple federal agencies, the Commonwealth of Pennsylvania, and the private sector, PSC is a leading partner of XSEDE, the National Science Foundation’s cyber infrastructure program. PSC provides university, government, and industrial researchers access to several of the most powerful computing, communications, and data storage systems available to scientists and engineers nationwide for unclassified research. PSC promotes the state of the art in high performance computing, communication and data analysis and offers a flexible environment to solve the largest and most demanding problems in computer science.
Click here to learn more.
Source: Kimberly Mann Bruch, SDSC