AbstractMolecular similarity search has been widely used in drug discovery to rapidly identify structurally similar compounds from large molecular databases. With the increasing size of chemical libraries, there is growing interest in the efficient ac- celeration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of Tatimoto coefficient in measuring the pairwise similarity between different molecular fingerprints. In this paper, we propose and optimize an FPGA-based accelerator design on exhaustive and approximate search algorithms. On exhaustive search using BitBound & fold- ing, we analyze the similarity cutoff and folding level relationship with search speedup and accuracy, and propose a scalable on- the-fly query engine on FPGAs to reduce the resource utilization and pipeline interval. We achieve a 450 million compounds-per- second processing throughput for a single query engine. On approximate search using hierarchical navigable small world (HNSW), a popular algorithm with high recall and query speed, we propose an FPGA-based graph traversal engine to utilize high throughput register array based priority queue and fine- grained distance calculation engine to increase the processing capability. Experimental results show that the proposed FPGA- based HNSW implementation achieves a 35× speedup than existing works on CPU. To the best of our knowledge, our FPGA- based implementation is the first attempt to accelerate molecular similarity search on FPGA and has the highest performance among existing approaches.
Published: March 18, 2022