String matching is at the core of many real-world applications, such as security, bioinformatic, data mining. All these applications requires the ability to match always growing data sets against large dictionaries effectively, fastly and possibly in real time. Unfortunately, string matching is a computationally intensive procedure which poses significant challenges on current software and hardware implementations. Graphic Processing Units (GPU) have become an interesting target for such high-throughput applications, but the algorithms and the data structures need to be redesigned to be parallelized and adapted to the underlining hardware, coping with the limitations imposed by these architectures. In this paper we present an efficient implementation of the Aho-Corasick string matching algorithm on GPU, showing how we progressively redesigned the algorithm and the data structures to fit on the architecture. We then evaluate the implementation on single and multiple Tesla C2050 (T20 ``Fermi'' based) boards, comparing them to the previous Tesla C1060 (T10 based) solutions and equivalent multicore implementations on x86 CPUs. We discuss the various tradeoffs of the different architectures.
Revised: June 29, 2011 |
Published: February 25, 2011
Citation
Tumeo A., S. Secchi, and O. Villa. 2011.Experiences with string matching on the Fermi Architecture. In Architecture of Computing Systems - ARCS 2011: 24th International Conference, February 24-25, 2011, Como, Italy. Lecture Notes in Computer Science, edited by M Berekovic, et al, 6566, 26-37. Berlin:Springer-Verlag.PNNL-SA-75647.doi:10.1007/978-3-642-19137-4_3