June 1, 2006
Journal Article

Cell Multiprocessor Communication Network: Built for Speed

Abstract

The existence of major obstacles to the traditional path to processor performance improvement has led chip manufacturers to consider multi-core designs. These architectural solutions promise a variety of power/performance and area/performance benefits. But additional care must be taken to ensure that these benefits are not lost due to inadequate design of the on-chip communication network. This paper presents the design challenges of the on-chip network of the Cell Broadband Engine (Cell BE) processor, and describes in detail its architectural design and the network, communication and synchronization protocols. In the experimental evaluation, performed on an early prototype, we analyze the communication characteristics of the Cell BE processor, using a series of microbenchmarks involving various DMA traffic patterns and synchronization protocols. We find that the on-chip communication subsystem is well matched to the to computational capacity of the processor. A Synergistic Processing Element (SPE) can issue an internal direct memory access (DMA) operation in less than 4 nanoseconds, and a DMA of a single cache line can be executed in less the than 100 nanoseconds. SPEs can achieve the optimal bandwidth of 25.6 GB/second in point to point communication with surprisingly small messages –only a few KB, using batches of non-blocking DMAs. The aggregate network behavior under heavy load is also remarkably efficient, reaching almost 200 GB/second with collective patterns and optimal contention resolution under hot-spot traffic. Additionally, we demonstrate the consistency of these hardware results with identical experiments carried out using the Mambo simulator software for Cell BE.

Revised: July 27, 2006 | Published: June 1, 2006

Citation

Kistler M., M. Perrone, and F. Petrini. 2006. Cell Multiprocessor Communication Network: Built for Speed. IEEE Micro 26, no. 3:10-23. PNNL-SA-48120.