HPC: exemplary cooperation between Bull and Heinrich-Heine University of Düsseldorf
Towards energy efficiency for supercomputers
The rate at which supercomputers consume energy is a subject of concern to both the IT industry and the world of research (academia and R&D centers) that make massive use of computing power. At the Heinrich-Heine University of Düsseldorf, energy efficient High-Performance Computing (HPC) systems are already a reality. Bull Germany and the University’s Institute for Information Technologies and Media (Zentrum für Informations-und Medientechnologie or ZIM) signed a cooperation agreement in 2007. Their aim was to work together on dedicated energy saving projects, beyond transferring technological expertise between research and industry.
As ZIM’s Professor Stephan Olbrich explains: “As the leading European player in the IT industry, Bull has the technology, logistics and expertise needed to design, implement and deliver support for an HPC solutions in scientific environments.”
First fruits of sustained teamwork to produce energy-efficient HPC
The very first step was the installation and operational launch of a hybrid cluster known as GAUSS by scientists, commissioned as a result of a Europe-wide invitation to tender. The system installed at ZIM consists of servers based on an Intel® Itanium®2 architecture, and servers based on Intel® Xeon® processors. The GAUSS cluster was the first hybrid cluster of this type in Germany. GAUSS is currently used by researchers in fundamental chemistry, the physics of condensed matter, botany and bio-computing, as well as many other disciplines with very diverse applications: for example, to simulate complex chemical or physical phenomena, and to use these simulations for 3D visualization or to compare protein sequences for different species.
Today, Bull experts and members of Professor Olbrich’s team are concentrating on increasing the energy efficiency of the cluster. “Most universities and research centers need to boost their computing power by acquiring more processing nodes, which also cost more. The result: higher power consumption, as well as higher costs when it comes to administering these systems,” explains Auke Kuiper, HPC Sales Director for Bull Germany. In the current climate, when budgets are tight, it is essential to free staff from some of their administrative workload, so they can devote more time to supporting HPC users and optimizing their applications, so researchers in turn can focus on their business: research. “Thanks to Bull, the energy efficiency of supercomputers, systems administration and the use of parallel computing are all optimized," Auke Kuiper sums up.
To achieve this, it is important to think in terms of ‘throughput computing’: how maximum workload can be achieved in a minimum timescale, with maximum performance for users. With this in mind, researchers at Düsseldorf have developed <myJAM/>, an intelligent workload management and monitoring system that provides a particularly relevant set of performance indicators. “We have implemented a Web 2.0 application which allows users and systems administrators to supervise the way jobs are progressing,” explains Dr. Stephan Raub, one of the developers working in the IT department. The data needed to carry out this monitoring is collected from throughout the cluster by a ‘demon’ that we have developed. This enables us to identify and iron out any bottlenecks at an early stage, and means we can more easily advise users how to define their resourcing requirements. In addition, if the cluster is not used to full capacity – for example, during the vacation period – the workload management program used by the researchers at Düsseldorf can simply switch unused sections of the cluster into economy mode, until they are needed again.
Another priority for cooperation is to increase computing power by combining different kinds of processors in a ‘hybrid’ system. In servers based on GPUs*, hundreds of cores that were originally optimized for graphics processing work in parallel. The inclusion of a GPU-based accelerator (NVIDIA Tesla) as part of the Bull cluster has enabled the Heinrich-Heine University of Düsseldorf to achieve some spectacular results. The accelerator, which has since been included in Bull’s product catalogue, is up to 20 times more efficient in terms of energy use than an equivalent configuration of traditional processors. However, such a hybrid cluster must be able to be administered as a single system. This demands new, more intelligent tools, which have been developed at Düsseldorf. These software tools provide users with a single environment, no matter what architecture is being used to run the application; something that is very interesting both for the market in general and for Bull itself.
“The study and development of environmentally-friendly technologies and software components for HPC, and translating them into a practical reality, are a major concern for Bull,” confirms Michael Gerhards, General Manager, Bull Germany. “ZIM is a top-notch partner for us, with a wealth of expertise to offer in this area.”
*GPU: Graphics Processing Unit