Power Aware Computing
The [computer] industry is going through the most profound shift in decades, moving to an era where performance and energy efficiency are critical in all market segments and all aspects of computing. The solution begins with the transistor and extends to the chip and platform levels.
Intel President and CEO Paul Otellini at the Intel Developer Forum, San Francisco, 9/26/2006
The continued exponential growth in the performance of Leadership Class computers (supercomputers) has been predicated on the ability to perform more computing in less space. Two key components have been 1) the reduction of component size, leading to more powerful chips, and 2) the ability to increase the number of processors, leading to more powerful systems.
There is strong pressure to keep the physical size of the system compact to keep communication latency manageable. The upshot has been an increase in power density. Efficiency plays a key role in the two aspects of managing this increased power density. One approach is to make the system more efficient in terms of the amount of computation performed for the energy expended. One possible metric is operations/Watt. The other approach is to remove the waste heat (computation converts electrical energy into heat) as quickly and efficiently as possible. One possible metric is the overall facility level coefficient of performance (COP), essentially the ratio of the energy used in computation to the energy expended to remove the waste heat.
Processor manufacturers are no longer able to increase frequency as a method to improve CPU performance. This is largely the result of more power being consumed by passive use such as gate leakage and source-drain leakage than by the active (computational) use. The current roadmap trend is towards 1) nominal performance increases on a chip being geared towards slowing or reverting increases in switching frequency, 2) replicating computational units (multicore), and 3) increasing transistor utilization at all architectural levels. The possibility of exploiting the energy efficiency of multicore designs needs to be investigated.
Microprocessor manufacturers are making their chips more and more sophisticated in the way they manage power, primarily to extend the battery life of laptops, but techniques designed for laptops generally fail miserably in high performance computing environments. Efficient cooling, however, allows the processor to stay in a high productivity state longer.
The Energy Smart Data Center (ESDC) project will expand on the software approach to exploit the power management hardware by controlling the number of spurious processes the operating system executes (jitter) and couple it with the ability to dynamically shift processor loads using goal-seeking algorithms to reduce hot spots and even the cooling load across the entire data center.
The expected outcome is a unique ability to not only keep individual processors in a state of optimal efficiency but balance the heat load resulting from multiple nodes in a manner that presents an even load on the cooling system across the data center.