October 24, 2012
Conference Paper

Efficient Sorting on the Tilera Manycore Architecture

Abstract

e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by our radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.

Revised: March 21, 2014 | Published: October 24, 2012

Citation

Morari A., A. Tumeo, O. Villa, S. Secchi, and M. Valero. 2012. Efficient Sorting on the Tilera Manycore Architecture. In IEEE 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2012), October 24-26, 2012, New York, 171-178. Piscataway, New Jersey:Institute of Electrical and Electronics Engineers. PNNL-SA-90686. doi:10.1109/SBAC-PAD.2012.41