Thierry Fromont is an engineer in Bull’s hardware development division. He trained at the Ecole Supérieure de Physique et Chimie de Paris (ESPCI), and his job with Bull is to package servers and HPC clusters.
During the 1980s, high-level information processing systems were based on ECL bipolar semi-conductor technology. This technology was extremely wasteful, and the systems employing it were generally cooled by circulating water around the actual components. The arrival of CMOS technologies enabled some reduction in wasted power and increased integration on a single integrated circuit (IC). It soon became possible to put together a high-level processor using just a few IC’s, consuming fewer than 5W each (in the 1980s), followed by a single IC consuming less than 10W (at the start of the 1990s). Air-cooling became much easier to use, and so this became the standard technology.
IT applications have always demanded ever-increasing amounts of processing power (for databases, business intelligence applications, intensive processing, Internet access…); and they still do today. Servers have advanced considerably as a result, incorporating more powerful processors and resulting in the increasingly standard use of SMP (symmetrical multi-processor) architecture. This route is the ‘Scale-in’ method, whereas deploying clustering techniques to enable the power of several basic servers to be added into the equation is known as the ‘Scale-out’ method. Therefore, the processing power of computer systems has continued to grow thanks to:
Being able to include more transistors on just one silicon chip, to enable more powerful processor architectures (SMP architecture, for example)
Higher frequency system clocks (intrinsic core performance)
Creation of large clusters (integrating up to 10,000 basic processors). In particular, it is these large clusters that enable High-Performance Computing (HPC).
The first two avenues of progress result from the use of increasingly refined technology to create transistors. In the short term, this happened without any significant increase in heat output; but increases in leaked current from transistors (from 90nm technology onwards) and the race for frequency led to the explosion in power dissipated from a processor, which today can be as high as 130W. This sudden change has created a fundamental challenge: just how do you air-cool such components within a server?
The third avenue of progress requires the creation of huge computer processing rooms, where the aim is to install the maximum number of servers in three dimensions with tall server racks comprising row upon row of racks… The objective here is to install the maximum GFlops, or the maximum number of Internet connections, per square meter. The data processing room will be filled to the upper physical limit in terms of the available space, electrical power supply or heat evacuation, whichever is reached first. This new development has created a second challenge: how, and to what point, to maintain air-cooling within the computing room?
Integrating components within servers
Cooling a processor that dissipates 130W demands high-performing heat sinks and fans, and good control of air-flow within the server. The industry has developed the technologies for heat sinks used in servers. This translates into an increase in the number of fins, and the use of materials providing good thermal conductivity (copper). The reduction in the distance between fins and the high air-flow requirement are the two main elements contributing to increased pressure drop in heat sinks, and require the use of more powerful fans. The market for fans has kept pace with these requirements. It is now possible to provide more compact fans (between 160mm and 60mm diameter, or even less) enabling the creation of almost identical air flows by optimizing the shape of the blades and increasing the rotation speed (up to 8,000 turns per minute), and capable of withstanding significant pressure drop.
All that remains is to construct the system, while guaranteeing good air circulation at the places that need it to ensure an optimum thermal environment around each component, and taking into account the heterogeneous structure of local pressure drops. This is achieved by using thermal modeling/fluid dynamics and digital simulation methods. Figure 1 shows an example of the kind of digital simulations used. The temperature fields in a sub-assembly are represented before and after optimization of air output circuits. The red zones (in the left-hand image) represent the hot points, and by adding a deflector, the situation is clearly improved (in the right-hand image).
Figure 1: optimizing a drawer
At the end of the survey, a thermal map of the machine is generated, and this can be used to ensure that the server is functioning correctly, as long as the temperature of the incoming air remains below 30°C (this is a typical value for the air input).
Integrating servers in racks, and the racks in the data center
Once integrated within a rack, the servers no longer need to be modeled to such a high level of detail. All you need to know, to begin with, is the heat output per unit of height and the air-flow required. The heat output per U (where 1U = a 44.45mm unit of standard height in the 19-inch rack system) has significantly increased now that more powerful components are available, and is also helped by higher density integration. It has increased from 200W/Unit to 400W/Unit, and in the near future we are anticipating densities of between 500W/Unit and 1kW/Unit. Because the power dissipated by the server has increased, the power dissipated per rack may exceed 10kW, and require a preparatory modeling stage. The example in Figure 2 shows a simulation describing the integration of 25 servers, each giving out 400W/U, in a rack. A first option for integration involves grouping the drawers into five groups of 5U and separating them by 3U. In the second option, a panel obstructs the inter-server spaces.
We can see in the first instance that the air exiting from the servers is sucked back in at the front, and that the incoming air temperature increases by about 4°C. These air re-circulation phenomena can be further exacerbated by the presence of more solid doors.
Figure 2: temperature fields in a rack before and after obstruction of the inter-server spaces
Similarly, integrating the racks in a datacenter requires thermal and fluid modeling before the cluster installation phase. This is all the more necessary when the power output per rack is higher than 4kW and when the number of racks giving off heat increases. Air conditioning systems enable around 4kW/m2 to be extracted (typical values). For a density of about 2m2 per rack, a mean dissipation per rack of about 10kW is obtained. This explains the fact that currently racks are not systematically filled to their maximum capacity, as a function of the cooling capacity for the room in question. The first step in designing a datacenter is therefore to respect these limitations, both in terms of the level of the server installation within the racks, and the installation of the racks in the room itself, also taking into account all the other relevant constraints. One of the most important of these is the limitation on length for some of the inter-rack electrical connections, which can require more costly fiber optic links.
Using simulation tools, we can effectively predict the fluidic and thermal behavior of datacenters, and juggle the various parameters (the way the bays are arranged, for example, the positioning of the perforated floor tiles, the depth of the raised floor….) to optimize air temperatures while ensuring that each server is operating under the best possible conditions.
As mentioned above, the re-circulation of air limits the possibility for air-cooling. The alternation of hot and cold passages limits this phenomenon (see figure 3). This effect is mainly noticeable at the edge of the cluster. Figure 4 clearly shows the hot air taken back by a rack.
Figure 3: Typical data center implementation
Figure 4: Air recirculation at the edge of an aisle
Bull has extensive experience of installing these kinds of high heat output systems, and has even implemented a cluster of 250 racks with output of 8kW per rack.
Figure 5 shows an example of a datacenter simulation, with the temperature fields in two perpendicular planes. One of these is situated at the entrance to the racks, which enables the incoming air temperature for each server to be checked, to ensure it falls within its specified limits. In this diagram, one row of racks has been removed to improve the readability of the diagram.
Figure 5: temperature fields taken across two planes of the cluster
Improving processing power in computing rooms
In order to respond to the demand for increased processing power from each unit of surface area, Bull is working in two main areas:
The first of these is the optimization of existing systems by organizing the air-flows within the computing rooms more effectively. Selective closing-off of particular areas, distribution of air-conditioning units around the room and guiding air-flows using baffles are typical areas for this kind of research. The overall aim is to capitalize on the customer’s existing investments by drawing the greatest possible benefit from them. Implementing such a solution requires the setting up of flow simulations, and the estimated limit of such an approach is of the order of about 12kW per rack. This is a highly cost-effective solution as it achieves a higher average density than 60% of the maximum density obtained by completely filling the racks.
Meanwhile, the increased sophistication of fluid dynamics and thermal modeling software enables increasingly detailed simulation of these kinds of systems, with a view to scoping and optimizing them even more effectively.
The second major avenue of interest is the use of liquid refrigerants – with up to 4,000 times greater cooling capacity than air – closer to the servers and so limiting the amount of air that needs to be processed by the air-conditioning units in the room. The technologies involved are mainly refrigerated doors, and closed racks with built-in air-water exchangers.
This approach should enable the racks to be entirely filled within the 20-40kW range. It requires the addition of an in-built [heat] exchanger, as well as ventilators within the housing. Over and above the evaluation of the real gain in density, two areas need to be addressed:
One technical, with two coexisting ventilation systems (for the heat exchanger and the servers)
One economic, concerning the additional costs involved for an existing computer room.
Taking these cooling constraints into account is an essential part of data center design. But it does involve knowing how to characterize the materials used in great detail, and a good understanding of modeling techniques. Bull has all the necessary tools and expertise to successfully manage this evolution. Today, Bull is working towards defining these products in such a way as to be able to propose scaled solutions ranging in power, cost and density in the range of 10-60kW per rack.