Hybrid cluster solutions and services that fully leverage the performance of accelerators
Multi-core CPUs have allowed HPC users to keep pace with the steadily growing demand for more computing power, but energy consumption, space, and cooling have become major inhibitors to computing systems expansion. Hence the success of accelerators such as GPUs (graphics processing units) offering breakthrough performance plus outstanding space and energy efficiency. GPUs are highly parallel processors designed to boost Extreme Computing applications, and derived from technologies initially developed for game machines. GPUs turbocharge system performance without requiring infrastructure changes for power or cooling, and without massive increase in the energy bill.
Many tools are available on the market to make GPU computing easier, but implementing GPUs is still not trivial. To help you innovate with GPUs and with maximum pay-off, Bull proposes complete GPU-based solutions with:
qualified Bull and 3rd party hardware,
integrated cluster environment,
expertise and services.
Bull porting and optimization expertise to maximize performance acceleration
Depending on the type of application and its degree of optimization, GPUs can accelerate processing by a factor of 1 to 100! Expertise is therefore essential to get the most out of GPUs. Bull has a long experience in clusters associating GPUs and CPUs, and places this distinctive expertise at the service of Extreme Computing customers. Bull engineers have had hands-on experience of porting and optimizing applications for GPU computing through many cooperative projects with customers, such as a cluster with 46080 GPU cores, designed by Bull for GENCI. Bull experts deliver end-to-end services, from system design to advanced user training, through application optimization.
Seamless integration in the Extreme Computing environment
The bullx cluster suite integrates all the tools needed to operate a hybrid system comprising Intel® Xeon®-based servers and NVIDIA® Tesla™ GPUs. The task management environment supports the allocation of applications to the relevant computing resource. An application can naturally use both computing resources so as to fully capitalize on the rich potential offered by this type of cluster.
Developed by Bull, the bullx B505 accelerator blades pack two Intel® Xeon® processors and two NVIDIA Tesla GPUs in a double-width, ultra-dense blade. They can be mixed with standard B500 compute blades within the same bullx chassis.
The integration of the GPUs within the blade guarantees optimal performance. The bullx B505 accelerator blades are the only blades on the market designed for full bandwidth between each GPU and host CPU, and for double interconnect bandwidth between blades:
1 dedicated PCI-e 16x connection for each GPU,
2 InfiniBand QDR network connections, so that each GPU has full access to QDR bandwidth.
The perfect match: teaming 4 CPUs with 4 GPUs in 2 U
Bull’s most popular GPU computing solution connects a 1U NVIDIA® Tesla™ S1070 computing system to a 1U bullx R422 E2 host incorporating latest Intel® Xeon® 55xx processors (Nehalem). The host runs the OS and part of the applications, while compute-intensive parts are run on the Tesla GPUs. The R422 E2 houses 2 servers, i.e. 4 CPUs, in a 1U chassis. With its two PCIe x16 Gen2 slots, the drawer containing 2 servers is an ideal match for a Tesla S1070 system equipped with 2 PCIe connections, thus teaming 4 CPUs with 4 GPUs. The Tesla S1070 connects to the twin servers through 2 interface cards installed in the 2 PCIe 16x Gen2 slots, and cabled to the Tesla.
The powerful bullx R425 E2 server (in tower or 4U rack form factor) is ideally suited to host an NVIDIA® Tesla™ C1060 computing processor, to create a high performance computing node or workstation.
The NVIDIA® Tesla™ C1060 computing processor, with 240 processor cores and a standard C compiler that simplifies application development, scales to solve the most important computing challenges more quickly and accurately. With the massively parallel architecture of the GPU, scientists and engineers can get a quantum jump in performance.
The CUDA C programming environment simplifies many-core programming and enhances performance by offloading computationally-intensive activities from the CPU to the GPGPU. It enables developers to utilize NVIDIA GPGPUs to solve the most complex computation-intensive challenges drug research, oil and gas exploration, and computational finance.
Specifications: NVIDIA Tesla S1070 Computing System
Rack mount 1U drawer accommodating 4 Tesla C1060 processors
Massively-parallel, many-core architecture offering the ability to execute 1000s of concurrent threads per GPU
960 scalar processor cores (240 per GPU)
Ultra-fast memory access with 408 GB/sec total bandwidth (102 GB/sec peak bandwidth per GPU)
4x 512-bit GDDR3 memory interface (512-bit interface per GPGPU)
Single Precision floating point performance (peak) : 3.73 to 4.14 Tflops
Double Precision floating point performance (peak) : 311 to 345 GFlops
Two PCIe connections
Typical Power Consumption: 800 W
Dimensions: 44 x 444 x 723 mm (HxWxD)
Software Development Tools:
C language compiler, debugger, profiler, and emulation mode for debugging
Standard numerical libraries for FFT (Fast Fourier Transform), BLAS (Basic Linear Algebra Subroutines), and CuDPP (CUDA Data Parallel Primitives)
Specifications: NVIDIA Tesla C1060 Computing Processor
Form Factor 10.5" x 4.376", Dual Slot
# of Tesla GPUs: 1
# of Streaming Processor Cores: 240
Frequency of processor cores: 1.3 GHz
Single Precision floating point performance 933 GFlops
Double Precision floating point performance 78 GFlops
Floating Point Precision IEEE 754 single & double precision floating point