Hot Topic – the Problem of Cooling Supercomputers
The continued exponential growth in the performance of Leadership Class computers (supercomputers) has been predicated on the ability to perform more computing in less space. Two key components have been 1) the reduction of component size, leading to more powerful chips, and 2) the ability to increase the number of processors, leading to more powerful systems. There is strong pressure to keep the physical size of the system compact to keep communication latency manageable. The upshot has been an increase in power density. The ability to remove the waste heat (computation converts electrical energy into heat) as quickly and efficiently as possible is becoming a limiting factor in the capability of future machines.
Convection cooling with air is currently the preferred method of heat removal in most data centers. Air handlers force large volumes of cooled air under a raised floor (the deeper the floor, the lower the wind resistance) and up through perforated tiles in front of (or under) computer racks where fans within the racks servers or blade cages distribute it across the electronics radiating heat, perhaps with the help of heat sinks or heat pipes. This system easily accommodates racks drawing 4-7 kW. In 2001 the average U.S. household drew 1.2 kW. Think about cooling half a dozen homes crammed into about 8 square feet. A BlueGene/L rack uses 9 kW. The Energy Smart Data Center’s (ESDC’s) NW-ICE compute rack uses 12 kW. Petascale system racks may require 60 kW to satisfy communication latency demands that limit a systems physical size. Additional ducting can be used to keep warm and cold air from mixing in the data center, but air cooling alone is reaching its limits.
Chilled water has been used by previous generations of bipolar transistor-based mainframes and the Cray-2 immersed the entire system in Fluorinert™ in the 1980s. Water has a much higher heat capacity than air and even than Fluorinert, but it is also a conductor so it cannot come into direct contact with the electronics, making transferring the heat to the water a challenge beyond simple plumbing and leaking issues. Blowing hot air through a water cooled heat exchanger mounted on or near the rack is one common way of improving the ability to cool a rack, but it is limited by the low heat capacity of air and requires energy to move enough air.
More efficient and effective cooling is only one part of developing a truly energy smart data center. Not generating heat in the first place is another component, which includes moving some heat sources (such as power supplies) away from the compute components or using more efficient power conversion mechanisms—the power taken of the grid is high voltage alternating current (AC), while the components use low voltage direct current (DC). Power aware components that can reduce their power requirements or turn off entirely when not needed are another element.
Cooling ESDC's NW-ICE
Fluorinert not only has a high dielectric constant (in excess of 35,000 volts across a 0.1 inch gap), but it has other desirable properties. 3M™ Fluorinert Liquids are actually a family of clear, colorless, odorless perfluorinated fluids (think liquid Teflon) having a viscosity similar to water. These non-flammable liquids are thermally and chemically stable and compatible with most sensitive materials, including metals, plastics, and elastomers. Fluorinert liquids are completely fluorinated, containing no chlorine or hydrogen atoms. The strength of the carbon-fluorine bond contributes to their extreme stability and inertness. Fluorinert liquids are available with boiling points ranging from 30°C to 215°C.
NW-ICE is being cooled with a combination of air and two-phase liquid (Fluorinert) cooling, in this case SprayCool™. Closed SprayCool modules 1) replace the normal heat sinks on each of the processor chips, 2) cool them with a fine mist of Fluorinert that evaporates as it hits the hot thermal conduction layer on top of the chip package, and 3) return the heated Fluorinert to the heat exchanger in the bottom of the rack. The heat exchanger, also called a thermal server, transfers the heat to facility chilled water. The rest of the electronics in the rack, including memory, is now easily cooled with air. The high heat transfer rate of the two-phase cooling allows the use of much warmer water than conventional air-water heat exchangers, avoiding plumbing condensation problems and allowing direct connection to efficient external cooling towers.
Two-phase liquid cooling is thermodynamically more efficient than convection cooling with air, resulting in less energy being needed to remove waste heat while at the same time being able to handle a higher heat load. ESDC will use NW-ICE to measure these differences while evaluating the reliability and total cost of ownership of this approach.
Alternative Cooling Approaches
SprayCooling is, of course, just one approach to solving data center cooling problems. A plethora of cooling technologies and products exist. Technologies of interest use air, liquid, and/or solid-state cooling principles:
- Evolutionary progress is made with conventional air cooling techniques that are known for their reliability. Current investigation focuses on novel heat sinks and fan technologies with the aim to improve contact surface, conductivity, and heat transfer parameters. Efficiency and noise generation are also of great concern with air cooling.1 Improvements have been made in the design of Piezoelectric Infrasonic Fans that exhibit low power consumption and have a lightweight and inexpensive construction.2 One of the most effective air cooling options is Air Jet Impingement.3 The design and manufacturing of nozzles and manifolds for jet impingement is relatively simple.
- The same benefits that apply to Air Jet Impingement are exhibited in Liquid Impingement technologies. In addition, liquid cooling offers higher heat transfer coefficients as a tradeoff for higher design and operation complexity.4,5
- One of the most interesting liquid cooling technologies are microchannel heat sinks in conjunction with micropumps because the channels can be manufactured in the micrometer range with the same process technologies used for electronic devices. Microchannels heat sinks are effective supporting large heat fluxes.6,7
- Liquid metal cooling, used in cooling reactors, is starting to be an interesting alternative for high-power-density micro devices.8 Large heat transfer coefficients are achieved by circulating the liquid with hydroelectric or hydromagnetic pumps. The pumping circuit is reliable because no moving parts, except for the liquid itself, are involved in the cooling process. Heat transfer efficiency is also increased by high conductivity. The low heat capacity of metals leads to less stringent requirements for heat exchangers.
- Heat extraction with liquids can be increased by several orders of magnitude by exploiting phase changes. Heat pipes and Thermosyphons exploit the high latent heat of vaporization to remove large quantities of heat from the evaporator section. The circuits are closed by either capillary action in the case of heat pipes or gravity in the case of Thermosyphons. These devices are therefore very efficient but are limited in their temperature range and heat flux capabilities.8
- Thermoelectric Coolers (TEC) that use the Peltier-Seebeck effect do not have the largest efficiency but have the ability to provide localized spot cooling, an important capability in modern processor design. Research in this area focuses on improving materials and distributing control of TEC arrays such that the efficiency over the whole chip improves.9
- Seung, MY, and S Lee. 2004. “High-Power Microelectronics: Thermal management and packaging fundamentals.” Proceedings of Itherm.
- Garimella, S V. 2005. “Advanced thermal management technologies for next generation microelectronics systems.” Proceedings of Itherm.
- Rundström, D, and B Moshfegh. 2004. “Investigation of flow and heat transfer of an impinging jet in a cross-flow for cooling of a heated cube.” ITHERM '04 The Ninth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, IEEE Xplore, Vol. 1, p 455-462, 1-4 June 2004.
- Kang, S. 2004. “Liquid Cooling of COTS Computer Systems.” 3rd Workshop on Thermal Management of High Flux Military and Commercial Electronics.
- Ohadi, M. 2004. “Liquid Cooling - Parameters affecting the Limits.” 3rd Workshop on Thermal Management of High Flux Military and Commercial Electronics.
- Garimella, S V. 2004. “Microchannel heat sinks and micropumps.” Proceedings of Itherm.
- Patterson, MK, X Wei, Y Joshi, and R Prasher. 2004. “ Numerical study of conjugate heat transfer in stacked microchannels.” ITHERM '04 The Ninth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, IEEE Xplore, Vol. 1, p 372-380, 1-4 June 2004.
- Miner, A, and U Ghoshal. 2004. “Cooling of high-power-density microdevices using liquid metal coolants.” Applied Physics Letters, 85(3), 19 July 2004.
- Walker, DG, KD Frampton, and RD Harvey. 2004. “Distributed Control of Thermoelectric Coolers.” ITHERM '04 The Ninth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems, IEEE Xplore, Vol. 1, p 361-366, 1-4 June 2004.