Previously General Manager of Bull’s HPC Business Unit – charged with developing this activity worldwide – Jean-François Lavignon (46) joined Bull in 1998 as Director responsible for research strategy in the field of software, and was subsequently the head of a business unit developing security solutions. Before joining Bull, Jean-François occupied several posts related to IT research. Most notably, he was head of a research laboratory which pioneered the use of massively parallel supercomputers for image and signal processing.
Bull’s latest innovation comes with the imminent installation of Europe’s first large-scale hybrid supercomputer at the CCRT (the Center for Research and Technology Computing) in France. This new High-Performance Computing (HPC) system will enable the researchers who use the computing center to benefit both from the processing power of the latest Intel® Xeon® processors and new-generation NVidia graphic accelerators. The system architecture that Bull has designed delivers overall power of around 300 Teraflops, while at the same time being extremely compact and energy-efficient thanks to new server cooling technology developed by Bull.
A new supercomputer to support research in France
GENCI (Grands Equipements Nationaux de Calcul Intensif) – a newly-established private company responsible for developing high-performance computing – has joined forces with the French Atomic Energy Authority (the CEA) to order a new HPC system. The supercomputer will be installed at the Center for Research and Technology Computing, the CCRT. With this new machine, Bull is introducing a number of innovations that could benefit all its HPC customers. The system is based on a hybrid architecture, which combines the computing power of standard, general-purpose processors, with that of graphic accelerators. These accelerators are themselves based on the same technology as the graphics cards widely used on PCs and games machines. In addition, the new computer is being cooled using new rack technology, featuring a heat exchanger (air/water) that is much more efficient than current solutions.
So the supercomputer’s processing power comes from two types of servers. The firsts use servers with general-purpose processors – each server with eight processing cores. These processors will be the new-generation Intel® Xeon® models, which feature numerous innovations, both at the level of their architecture (with memory bandwidth more than double that of competing processors) and at the level of the processing core, with new instruction sets. The supercomputer configuration will consist of 1,068 of this type of server, representing 8,544 processor cores delivering 103 Teraflops. The second type of server comprises four graphics cards based on the latest NVidia architecture, which offers 240 processing cores per card. In total, the 48 Graphics Processing Unit (GPU) servers provide 46,080 cores, delivering 192 Teraflops of 32-bit processing. Of course, applications can make use of all these resources simultaneously to reduce execution times.
The supercomputer is a cluster, based on an InfiniBand DDR (Dual Data Rate) interconnection network. This allows all the standard servers to exchange data, and those linked to accelerators to allow communications between the GPUs. The network also provides access to the storage facilities, which are shared with the CCRT’s earlier supercomputer, delivered by Bull in 2007. Globally, around a Petabyte of storage is available to users, and the new hybrid system will offer bandwidth of over 20 GB/s connecting to these storage facilities.
The software environment is based around Open Source components, which Bull is integrating and optimizing within its NovaScale Master for HPC environment. Within this environment, Bull provides all the tools needed to operate the GPU accelerators.
Figure 1: Diagrammatic representation of the hybrid system architecture
Why create a hybrid supercomputer?
Users are always looking for maximum processing power for the simulations. One way of providing this is to use standard, general-purpose processors which respond to this demand with multi-core technology, thus increasing the number of processing units. The all-purpose architecture of the processing cores means they can execute a wide range of instructions and they offer a great deal of flexibility when it comes to controlling the execution of those instructions.
A second possibility is to use specialized components featuring more simple processing cores, and a more limited control scheme. Simplifying the processors means more cores are available for the same costs in terms of transistors or electrical power. This is true for GPUs, initially designed for graphics processing, which get their processing power by using hundreds of very simple computing units capable of carrying out the same kinds of operations simultaneously on different data. This kind of control is known as ‘data parallelism’. Thanks to the progress made in micro-electronics and the demands for increasingly complex computer games, GPUs are now at the stage where the power of their processing cores and their much higher degree of ‘programmability’ means they can be used for scientific computing.
Nevertheless, the capacity of the GPU cores and the architecture involved in controlling them means it is more difficult for an application to take full advantage of their efficiency. Because control has been designed so that a set of cores executes the same instruction set on several different data elements at a given instant, GPUs lend themselves particularly well to ‘data parallel’ applications, where it is necessary to carry out the same processing actions on a large collection of data. These types of applications include:
- Processing seismic data (oil companies were among the first to be interested in this technology)
- Processing medical data (CT scans…)
- Comparing genome sequences
- Signal processing
- Digital simulation in chemistry (molecular dynamics in situations where calculating the forces involved does not require 64-bit processing).
With the hybrid architecture that Bull has designed, an application can use the computing resources of both the Intel® Xeon® processors and the GPUs at the same time. So an application that includes one part which implements data parallelism and another requiring less uniform processing can fully capitalize on the rich potential offered by this system. For applications that only really follow one of these models, it would be sensible to only use the relevant part of the machine: something that is taken care of by the task management environment offered as part of NovaScale Master.
Figure 2: GPU* server, consisting of four one-Teraflops cards
Another advantage of hybrid architecture in that the GPUs are also very useful when it comes to optimizing the energy consumption of a large-scale computing system. A GPU server delivering 4 Teraflops of 32-bit processing power only consumes 700W. By way of comparison, a standard server delivering peak power of 0.1 Teraflops consumes 350W. This 20-fold reduction in power is due to the sheer simplicity of the computing core and to the architecture which, most notably, uses much less memory with much more limited capacity for accessing it. For applications that are capable of fully utilizing the power of this kind of architecture, GPUs deliver a high level of energy efficiency.
Intelligent energy management: the advantages of water cooling
The second major innovation of this system comes from its method of cooling; with cabinets featuring an air/water heat exchanger. This technology responds to a need that has arisen with the evolution of computing clusters. On the one hand, in order to improve their performance servers make use of increasingly rapid signals that can only be routed on electronic cards over very small distances. So they have become more and more compact, and as a result their thermal density is growing. On the other hand, the communication speeds of interconnectivity networks are getting faster thanks to communications frequencies on higher-level links. So in the interest of controlling network costs and ensuring the quality of communications, it is better to have the shortest possible distances between servers. This has led to clusters consisting of cabinets filled with the maximum number of servers, which are ideally located in smaller amounts of floor space. As a result, a great deal of heat is generated in a very small space, which needs to be dissipated. Traditional cooling techniques are of limited use, because the speed and amount of air circulating in the computer room cannot be increased indefinitely.
Figure 3: Cool cabinet door
Faced with this problem, Bull has designed a solution that is not only highly effective, but also very flexible. It consists of a cool cabinet door with an air/water heat exchanger, capable of dissipating up to 40 KW, fitted on a standard cabinet. By using this door, there is not need to modify the way servers are cooled, using air flows within the computer room to dissipate the heat produced by the processors and memory chips. But instead of releasing this warm air in the computer room, the cool cabinet door allows to release air that has been cooled to room temperature. By regulating the air and water flows in the heat exchanger, the door adjusts its operation according to the heat dissipated by the servers present in the rack. So a large-scale air-conditioning unit is no longer needed to regulate the temperature in the computer room.
This air/water heat exchanger mechanism integrated into the cool cabinet door also helps to improve the energy efficiency of the Data Center infrastructure. In effect, because the warm air is cooled as closely as possible to where it is produced, there is no recirculation of that warm air, which is costly in terms of the energy required. Globally, a solution based on the new Bull cabinet saves energy when it comes to cooling a cluster. This benefit is in addition to the savings made on air-conditioning infrastructure for the computer room. So the total cost of ownership for the cluster is improved both from the moment when it is bought (cool cabinet doors being less costly than air-conditioning units) and throughout its operational life (with lower electricity bills).
Finally, this new technology perfectly fulfils its objective of enabling the construction of more densely-packed clusters. The processing part of the CCRT’s hybrid machine, for example, only occupies 55m2 while delivering some 300 Teraflops of power.
A powerful combination of expertise in Petaflops-scale systems and environmental responsibility
The new hybrid cluster that will be installed at the CCRT is a unique asset for the French research community. It will deliver levels of computing power unrivalled in Europe and will enable users to test the combination of the processing power offered by GPUs with the computing capacities of a traditional cluster. Thanks to this new supercomputer, we can look forward to some interesting breakthroughs in many areas including energy, bio-sciences and molecular dynamics, in industry and research alike.
With the installation of this hybrid supercomputer, Bull is taking a very important step towards fully mastering Petaflops-scale configurations, particularly with the introduction of its new cooling technology which will be extremely significant in this area. This new machine is also a great demonstration of Bull’s ability to deliver highly energy-efficient HPC solutions with an ever-improving total cost of ownership.
* GPU : Graphics Processing Unit