Faster, cheaper gene sequencing to make healthcare more precise

Genome sequencing could be as affordable as a routine medical test with highly efficient computing.

Arun Subramaniyan Enlarge

Arun Subramaniyan, CSE PhD student, received a U-M 2018 Precision Health Scholars Award for his project, “Hardware-accelerated systems for next-generation sequencing analysis.” Subramaniyan is working with his advisor Prof. Reetuparna Das and Profs. Satish Narayanasamy and David Blaauw to make genome sequencing as affordable as a routine medical test with highly efficient computing.

Subramaniyan’s work focuses on using hardware accelerators to enable more efficient string matching. These custom processors are designed to perform a specific function as accurately as possible, at the expense of the flexibility that comes with standard CPUs. When U-M announced their first Precision Health Award in 2017, the researchers saw an opportunity to apply this technology to problems in bioinformatics that could benefit from powerful string matchers – chief among them the problem of processing genomes.

Representing the full sequence of DNA for an individual, a genome extends to around three billion characters in length. Only about a million differences exist in that enormous string when you compare one person’s genome to another, and it’s those differences medical researchers can use to gain valuable insights into genetic disorders and other health conditions. This process of comparing two genomes to analyze their differences is called read alignment, and represents a particularly daunting computational task.

“The way the sequencing instrument gives you this information isn’t just as one long string of your DNA,” Subramaniyan says. “It chops it into small pieces that you then have to put back together. So the computation problem is reassembling around a billion small pieces by comparing each piece against a reference genome. So it’s a string matching problem, but at a very large scale.”

Processing strings as massive as a genome currently takes days. To get that time down to hours (as well as dollars), would require 10-100x more efficient computing solutions than those available today.

“Precision health is one domain that critically depends on our ability to efficiently process large volumes of data,” said Prof. Das in Precision Health’s press release.

Read alignment would serve as just one in a longer pipeline of processes to extract valuable data from different genomes. The researchers’ goal is to develop an optimized system stack for genomics data analysis that uses hardware accelerators at key points where the computing load is the heaviest.

Developing this field of research could lead to major benefits in healthcare. Genomics marks the transition from population-based diagnosis and treatment of diseases to precision medicine, wherein strategies for disease prevention and drug selection are developed and customized to meet the needs of an individual.

With the Precision Health awards program, U-M Precision Health aspires to expand the field of precision health by cultivating a cohort of promising early-career researchers in the field and spark new collaborative research avenues by engaging early-career investigators with tools and data to support their work.