Skip to Main Content U.S. Department of Energy
Science Directorate
Page 45 of 248

Advanced Computing, Mathematics and Data
Research Highlights

March 2016

Parasail Navigates New Research Territory

Software library unites core bioinformatics algorithms for improved performance, efficiency

Results: High-performance computing continues to change the research landscape and is an elemental part of bioinformatics, where software tools and technology must manage massive volumes of biological and genetic information. In “Parasail: SIMD C Library for Global, Semi-Global, and Local Pairwise Sequence Alignments,” author Jeff Daily, a scientist with PNNL’s Advanced Computing, Mathematics, and Data Division HPC group, introduces the Parasail software library. Parasail offers first-of-its-kind access to multiple intra-sequence local pairwise alignment algorithms and improves upon earlier implementations widely used in bioinformatics applications. The article is featured in the February 2016 issue of BMC Bioinformatics.

GA
Jeff Daily, of ACMD Division’s HPC Group, authored the Pairwise Sequence Alignment Library, or Parasail, code now available on GitHub.

Why it Matters: In bioinformatics workflows, sequence alignment algorithms commonly are used to map characters between two DNA or protein sequences and identify highly similar regions between them. However, the three primary classes of sequence alignment algorithms—global, semi-global, and local—have varied availability and efficiency. Parasail, a single-instruction multiple data (SIMD) C library, contains implementations of global (Needleman-Wunsch), semi-global, and local (Smith-Waterman) pairwise sequence alignment algorithms that can be integrated into other software packages. Parasail represents the first time global, semi-global, and local alignment algorithms are available in one open-source high-performance software library.

Methods: Parasail can implement most known algorithms for vectorized pairwise sequence alignment and is designed for 64-bit Linux, OS X, or Windows on processors with SSE2, SSE4.1, AVX2, or KNC (Xeon Phi) instruction sets. It uses CPU dispatching to select the correct implementation at runtime and avoid running code with instructions that the host CPU does not support. In addition, Parasail can easily be extended in the future to support upcoming instruction sets, such as AVX-512.

Benchmarking was performed by testing and comparing BLAST, BLAST+, SWIPE, libssa, opal, and SSW software packages, along with a new striped method available in Parasail, against the software library’s local alignment implementation. Parasail’s local alignment performance also was compared against its global and semi-global performance for striped and scan vectorized approaches.

“While another database search application proved faster in the evaluation than Parasail for sequences shorter than 500 amino acids, Parasail demonstrated it was fastest when applied to longer sequences,” Daily explained. “Notably, Parasail’s implementation for global sequence alignments generally was the fastest overall. Only opal performed better for single-threaded applications. Parasail also is the first to provide implementations of the currently best-performing sequence alignment algorithms on today’s most advanced CPUs.”

What’s Next? Launched in September 2015, the Parasail code is available for download from GitHub here (under the Battelle BSD-style license). Future versions will include additional capabilities relevant to bioinformatics research, such as returning full-alignment tracebacks. As open-source software, program requests, enhancements, and bug fixes will be shared among the growing Parasail community.

“With its multi-platform compatibility, modular implementation, and adaptive code generation process, Parasail has been designed to grow as more instruction sets and wider vector units become available. The goal is to provide a long-term, viable resource for improving applications that benefit the bioinformatics research community,” Daily added.

Acknowledgments: This work was supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, as well as by PNNL Institutional Overhead funds. Performance optimization for high-performance computing architectures was supported by the U.S. Department of Defense under the Autotuning for Power, Energy & Resilience (ATPER) project. PNNL Institutional Computing also provided resources benefitting this research.

Reference:
Daily J. 2016. “Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.” BMC Bioinformatics 17(1):1-11. DOI: 10.1186/s12859-016-0930-z.

Code Available at: https://github.com/jeffdaily/parasail.


Page 45 of 248

Science at PNNL

Core Research Areas

User Facilities

Centers & Institutes

Additional Information

Research Highlights Home

Share

Print this page (?)

YouTube Facebook Flickr TwitThis LinkedIn

Contacts