The Cray XMT provides hardware support for parallel algorithms that would be communication- or memory-bound on other machines. Unfortunately, even if an algorithm meets these criteria, performance suffers if the algorithm is too numerically intensive. We present a lookup-based approach that achieves a significant performance advantage over explicit calculation. We describe an approach to balancing memory bandwidth against on-chip floating point capabilities, leading to further speedup. Finally, we provide table lookup algorithms for a number of common functions.
Revised: March 19, 2010 |
Published: May 25, 2009
Citation
Scherrer C., T.R. Shippert, and A. Marquez. 2009.Accelerating Numerical Calculation on the Cray XMT. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS 2009), May 23-29, 2009, Rome, Italy. Los Alamitos, California:IEEE Computer Society.PNNL-SA-64423.