Abstract—The field of bioinformatics and computational biol- ogy is experiencing a data revolution — experimental techniques to procure data have increased in throughput, improved in accuracy and reduced in costs. This has spurred an array of high profile sequencing and data generation projects. While the data repositories represent untapped reservoirs of rich information critical for scientific breakthroughs, the analytical software tools that are needed to analyze large volumes of such sequence data have significantly lagged behind in their capacity to scale. In this paper, we address homology detection, which is a funda- mental problem in large-scale sequence analysis with numerous applications. We present a scalable framework to conduct large- scale optimal homology detection on massively parallel super- computing platforms. Our approach employs distributed memory work stealing to effectively parallelize optimal pairwise alignment computation tasks. Results on 120,000 cores of the Hopper Cray XE6 supercomputer demonstrate strong scaling and up to 2.42 × 107 optimal pairwise sequence alignments computed per second (PSAPS), the highest reported in the literature.
Revised: July 30, 2013 |
Published: December 26, 2012
Citation
Daily J.A., S. Krishnamoorthy, and A. Kalyanaraman. 2012.Towards Scalable Optimal Sequence Homology Detection. In 19th International Conference on High Performance Computing (HiPC), December 18-22, 2012, Pune, India. Piscataway, New Jersey:Institute of Electrical and Electronics Engineers.PNNL-SA-90521.doi:10.1109/HiPC.2012.6507523