A recent trend in hardware development is producing computing systems that are stretching the number of cores and size of shared-memory beyond where most fundamental serial algorithms perform well. The expectation is that this trend will continue. So it makes sense to rethink our fundamental algorithms such as sorting. There are many situations where data that needs to be sorted will actually fit into the shared memory so applications could benefit from an efficient parallel sorting algorithm. When sorting large data (at least hundreds of Gigabytes) in a single shared memory, there are two factors that affect the algorithm choice. First, does the algorithm sort in-place? And second, does the algorithm scale well beyond tens of threads? Surprisingly, existing algorithms posses either one of these factors, but not both. We present an approach that gracefully degrades in performance as the amount of available working memory decreases relative to the size of the input.
Revised: December 13, 2013 |
Published: May 20, 2013
Citation
Haglin D.J., R.D. Adolf, and G.E. Mackey. 2013.Scalable, Multithreaded, Partially-in-Place Sorting. In IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW 2013), May 20-24, 2013, Cambridge, MA, 1656-1664. Piscataway, New Jersey:Institute of Electrical and Electronics Engineers.PNNL-SA-93139.doi:10.1109/IPDPSW.2013.74