Contents
Editorial
Executive opinion
Guest contributors
Hot topics
Business cases
Experts voice
Solutions
Reader Survey
At a glance
Events
PDF version
 

Subscribe to
Bull Direct:


 

Archives
September 2006
Guest contributors
Europe is back in supercomputing
Interview with Jean Gonnord,
Head of the Numerical Simulation Project and Computing at CEA/DAM

To make up lost ground, Europe should have a more proactive policy in supercomputing, centred on a synergy between defence, industry and research.

La Recherche. A glance at the Top 500 is evidence enough that France and Europe are lagging far behind the United States and Japan in supercomputers. How do you explain this?
Jean Gonnord. Lagging behind like this is very alarming and is a direct consequence of setbacks in large ‘computational projects’ at the beginning of the 1990s. The European intensive computing industry collapsed and only a few businesses survived. This was the case for Meiko in Great Britain for example, which after its financial collapse was bought by the Italian firm Finmeccanica. Renamed Quadrics, this company is today producing the ‘Rolls Royce’ of networks. In France, after a long period in the desert, Bull is coming back to the forefront with the TERA-10 machine.
With the almost non-existent industrial framework and the lack of any real strategy, European countries are using a ‘cost base’ policy in intensive computing [1]. High Performance Computing (HPC) is considered as a tool used in a few disciplines. Laboratories are investing in HPC using their own research fundings with naturally the aim to get the cheapest machines. This has some odd effects: users practise self-censorship and depend on the American and Japanese makers to define what tomorrow’s computing will be like. And this makes Europe fall even farther behind.

By contrast, the computing policies of the United States and Japan, which can be defined as ‘strategic opportunity’ [1], imply a massive support to the sector’s industrial group...
The USA aspire to one thing – world supremacy in this field which they think of as strategic. And they managed to do so: naturally by investing very large budget in the field, but also getting the most out of synergies between defence, industry and research. In real terms, the HPC policy is decided at the level of the President himself who relies on the conclusions of the annual report from the President’s Information Technology Advisory Committee (PITAC).
This is then implemented by the Department of Energy (DoE*), the Department of Defence (DoD) and the major research agencies – the National Science Foundation (NSF) and the Defence Advanced Research Project Agency (DARPA). These agencies fund both civilian and military laboratories, universities and the main computing centres to equip them with really big machines. But, and this is an important point, calls for project proposals are only open to American industry!
The Japanese have an almost identical policy, but the main applications are in civilian security.

Can you give us an idea of the American budgets?
They are considerable. For just the Advanced Scientific Computing Initiative (ASCI) program, since 1995 the DoE has been investing some 100 million dollars per year in its three military laboratories (Lawrence Livermore, Los Alamos and Sandia) just for the sheer power of one machine and 120 million dollars every three years to develop another machine at the technological limits! And that’s not all. ASCI is also financing a research and development (R&D) program (Path Forward) aimed at American makers for them to focus on high performance computing (50 million dollars per year) and another, Alliance, to support upstream university research (8 million dollars per year). And this example is just the top of the iceberg. Historically, the major provider of R&D funds in the American computing industry has always been the National Security Agency (NSA) and of course this has not changed especially since September 11th…

Two years ago, China surprised the world by announcing a machine which made 13th place in the Top 500 supercomputers…
The emergence of this country in supercomputing field is really amazing. The chosen policy is similar to that of the United States, but the stated objectives of the Chinese government are more modest, at least for the moment : to become independent and therefore to control the entire technological chain from manufacturing processors to the final integration of systems. With this in mind, the Minister of Science and Technology launched an ambitious R&D program planned in five year periods from 1986 onwards having both civilian and military objectives. Nine large computing centres were created. For two years now the installed computing capacity has overtaken that of France! And the rate of progress is impressive. Even if the first large Chinese supercomputers were bought from the United States, the second generation has been developed and assembled in China using American processors. The next generation will in all likelihood be 100% Chinese. Two projects have been launched to manufacture microprocessors: Godson for scientific computing and ArkII for general public use. Recently, the Chinese have announced that they are going to compete in the petaflop* race… Like the United States, the model for development is based on a defence-industry-research synergy. Europe and France would do well to be inspired by this. Only a strategic opportunity policy and putting in place a major European R&D programme would allow us to make up the lost ground.

That’s precisely what you’ve done with the TERA project. When and how did this project start?
In 1996 after the President of France signed the treaty banning all nuclear testing, the CEA set up the Simulation programme within its military application direction. The aim was to guarantee the safety and reliability of weapons for deterrent. The program has two parts: one based on experimentation (with the AIRIX flash radiography machine and the Megajoule Laser being built in Bordeaux) and the other on numerical simulation. Computing is used to reconstruct the different stages in the functioning of a weapon. Around 100 computing engineers and mathematicians have been working on this simulator for almost ten years. They write software, that is millions of lines of code, developed from ‘models’ established by an equal number of physicists and validated in detail by referring back to past experiments. This colossal task is still ongoing and increasingly sophisticated models are being included in the simulator. To ‘run’ the simulator in a reasonable time (several weeks maximum) we would need a much more powerful computer than was available at the time. The required capacity in 2010, when the simulator will be complete, has been estimated as being 100 teraflops of sustained speed*, that’s one hundred thousand billion useful operations per second! Our Cray T90 only provided 20 gigaflops* at the time (twenty billion operations per second)! Let me tell you at that time that the prefix ‘tera’ (for 1012), which stands for ‘monster’ in Greek, gave the project its name.

Did this pose a particular problem for the vendors?
The sustained power of 100 teraflops in 2010 was in 1996 well above what they could offer according to Moore’s law. Roughly, this law predicts that the power of computers doubles every eighteen months for a fixed cost. This would give us at most 2 to 5 sustained teraflops in 2010 by extrapolating from the power of the very powerful Cray computers we had. Needless to say that such a gain in power which implies a fundamental change of the machine architecture demands considerable scientific and technological jumps. Only the parallelisation of many processors would solve this problem. But for reasons of costs these processors should be as cheap as possible that is those available on the mass market. We very soon realised that we would need to push the vendors beyond their limits. To influence their choices we would need to be able to discuss things on an equal footing . In 1997 we brought together a team of top experts on the CEA/DAM-Île-de-France site in Bruyères-le-Châtel. Around fifty engineers were able to interact with the vendors and help in defining an architecture that would fulfil our requirements. A timescale was established to achieve 1 teraflop of sustained speed in 2001 (operation TERA-1), 10 sustained teraflops in 2005 (TERA-10), and 100 sustained teraflops in 2009, all within a strict budget. Now we forecast to bring this capacity up to 10 sustained petaflops in 2017.

In real terms, you launched a call for proposals in 1999 for a machine with a 1 teraflop sustained speed. The specifications were extremely complex with more than 250 criteria and their related penalties! What was the response from the vendors?
Most of them didn’t consider it feasible. Two answered with the best they could offer: IBM and Compaq (in fact Digital which had just been bought by Compaq). The latter won the bid. But with the very fast progress in technology, the machine that was delivered to us at the end of 2001 wasn’t exactly what we ordered! However it did allow us to meet our goals and achieve 1.37 sustained teraflops. A really great success…

So what conclusions did you draw from this first experience?
First of all that it was possible to overtake Moore’s law, which vendors normally swear by, to the benefit of all partners. The scientific community also benefited. This machine would never have existed without us or, at least, not so soon. On our side we got the computing resources we needed for nearly five years, tested the simulator during its development, but also learn several lessons for the next machine: TERA-10.

What, for example?
When we commissioned TERA-1 our main obsession was the power. But once that goal was reached we realised that data management was just as important. I’ll just give a few figures: everyday TERA-1 produces more than 3 terabytes of data that’s in the range of 1 petabyte per year. Now no machine is safe from breakdown. As we cannot allow the results of a calculation, which might last several weeks on thousands of processors, to be lost we need to save the data very regularly.
Unfortunately these operations are very greedy in terms of computing time. We estimate that in one hour the machine should not spend more than five minutes saving data and emptying its memory, which determines the size of the Input/Output (I/O) system. But that has turned out to be much more complex than expected. We underestimated the I/O capabilities of the machine. Also because of the architecture, the data must be written in parallel with keeping the possibility of reloading the data, not necessarily on the same processors. This poses problems for synchronisation when the machine is functioning at full capacity. Our teams and the vendor spent several months to get around this type of problem.

Wouldn’t things have been simpler if you hadn’t ordered a blueprint machine?
Obviously, yes. In computing two years is like an eternity. In 1999 the vendors answered our call for proposals with technologies that only existed on paper. So it took them some time to develop and implement them. The lesson is clear – the time lapse between commissioning and delivery should be as short as possible. Above all, before signing contracts we should insist on technological demonstrations that prove the essential elements of the machine work.

From the beginning of the TERA-1 operation you’ve offered computing time and your expertise to researchers and industry. What were your reasons?
To give them access to resources which they didn’t have and by doing this to make our own project more credible. Numerical simulation is generally validated by one or more experiments. But with the end of nuclear testing we found ourselves in a new situation. Comprehensive experiments were no longer possible so how can we possibly assure the outside world that our project is credible without divulging the details of our methods for obvious security reasons? To demonstrate that we are totally proficient in the technology, that we have the best teams and the most powerful computing resources we began to look further afield. The idea was simple – any major challenges, whatever the subject, solved with our help would consolidate the credibility of our teams and our methods. So we not only offer our computing power, but got our experts to join in projects like genome sequencing or the modelling of the prion movement [2].

This policy of openness translated into the creation of the CEA Scientific Computation Complex. With its 60 teraflop machine, it is Europe’s largest computing centre. How does it work?
By creating this complex, CEA wanted to get the most out of the synergy in its defence-industry-research programs and the outcomes of the numerical simulation program. Nearly one hundred and fifty CEA/DAM engineers and researchers are now working there. The complex is made up of the Defence Computation Centre with the Tera machine, the CCRT (Centre de Calcul Recherche et Technologie) which is open to all, and finally, a centre for experimentation where our experts work with people from university and industry. The complex is managed by the Ministry of Defence for TERA and by a committee on which each partner is represented in proportion to their investment for the CCRT. Today the CEA has a little over a half of the shares of CCRT. The remainder belong to large corporations (EDF, Snecma, etc) or laboratories like ONERA. With the arrival of TERA-10 the overall capacity of the complex has reached 70 teraflops (60 for defence, 8 for CCRT and 2 for experimentation) end of 2005. It will pass the 100 teraflop mark beginning of 2007 when the new 40 teraflop CCRT machine will be delivered.

Almost two years ago, a technopole, Ter@tec, was also inaugurated on the DAM-Île-de-France site in Bruyères-le-Châtel…
The CEA scientific computing complex is in fact at the hub of a much wider operation – Ter@tec. The technopole’s aim is to unify all parties interested in numerical simulation around the scientific computing complex: researchers, industrials, and technology users and suppliers. And also to share the Defence programme’s spin off with the scientific community and industry -– and from here to bring Europe back up to the top level in high-performance computing.

Has this collaboration already borne fruit?
Two associated laboratories have already been created with the University of Versailles and l’École Centrale de Paris, and large industrial groups (Bull, Dassault, EDF, HP, Snecma) are collaborating with us in promoting simulation or in defining the next generation of machines. FAME is one of the first projects to come out of this synergy. Uniting Bull, CEA and the University of Versailles, this project supported by the Ministry of Industry has led to the development of a high quality server dedicated to scientific computing. It has been commercialized by Bull under the name of NovaScale since 2003. Fresh from this success, a second project (TeraNova) was undertaken in 2003-2004, this time without state aid, with the University of Versailles and the companies Bull, Dassault and Quadrics. The goal was to create a teraflop machine. The industrial outcomes of these operations are clear. Thanks to them, Bull was able to develop a very general commercial product which can be used in both the commercial and scientific markets. They also developed the expertise which places them at the level of the largest corporations. This meant they could answer the TERA-10 call for proposals.

Moving onto the jewel in the crown, the TERA-10 machine. What have been its constraints?
There again our main goal – a 10 teraflop sustained speed – was far beyond predictions from Moore’s law. Like TERA-1, the general machine architecture had to be a SMP cluster type (shared memory multiprocessors). But we had two additional demands. First we wanted a very high sustained speed for a minimal overall cost including the dissipated power and the floor space. This involved using the first dualcore processors on the market placing us again at the limit of technology. Then we wanted to have large SMP servers for technical reasons (existing codes with a low degree of parallelism and the development of new multiscale models). A tough challenge for the vendors! Finally we wanted fifteen to thirty times higher I/O capacities with, of course, the software capable of processing such volumes with maximum reliability. Based on this, our architects wrote a very complete portfolio of specifications with 278 criteria including 53 corresponding to benchmarks defined by our experts. The call for bids was launched in January 2004. Eight vendors showed an interest. The call for proposals followed in March.
TERA-10 is the most powerful European machine.

What’s more, for the first time in the history of high-performance computing it was made in Europe. Is that the reason you chose Bull?
Of course not! Let me just remind you that this machine is one of the crucial elements in a program that must guarantee French weapons for deterrent. Is it imaginable that the CEA/DAM who bears this responsibility could make a choice which might compromise the program for economical reasons or prestige? Five major makers answered the call for proposals: Bull, Dell, IBM, HP and Linux Networks. Bull made the best proposal. It was able to offer us a homogeneous machine having nodes with 16 dualcore processors and a sustained performance at our Tera benchmark of at least 12.5 teraflops. The Bull machine also had by far the best I/O system and the most reasonable electricity consumption. Finally, BULL proposed an essentially open source solution for system software safeguarding the CEA’s freedom of choice in the future. We are obviously very proud that a French company won this challenge. It emphasises the quality of our openness initiative via Ter@tec and the benefits that the French economy can gain from a defence-industry-research synergy. Finally, the victory of Bull marks Europe’s return to the field of high performance computing which is certainly gratifying.

Does this success story show the way for France to get back in the race?
The conclusions of the report by Emmanuel Sartorius and Michel Héon submitted to the Ministry of Research [3] are very clear. The implementation of a real policy in high performance computing is essential and our methods – grouping resources and the defence industry research synergy – seem to them to be the most appropriate. Times change - and mentalities too! Since the beginning of 2005 we have seen several changes. For example, the National Research Agency (ANR) has included an ‘intensive computing’ aspect in its program and launched a call for projects last July. Nearly fifty projects were submitted last September and have been evaluated. Another sign is that the System@tic competitiveness initiative, of which Ter@tec is one of the key elements, has just launched a project to develop the new generation of computers with the Ministry of Industry’s support. Of course, these efforts do not compare with those undertaken in the United States. But it’s a good start.

Will we see a similar initiative at the European level?
Yes. After a year of effort and persuasion, supercomputing is going to reappear in the budget of the 7th European RTD Framework Programme* (2007- 2013) [4] which should include an industrial aspect. The beacon project in this initiative will be, if it is accepted, to set up three or four large computing centres in Europe with the mission not of providing computing for a given scientific theme, but to stay permanently in the top three of the Top 500. Undoubtedly, this will mean that major numerical challenges could be solved in the majority of scientific disciplines leading to major technological jumps. The CEA/DAM-Île-de-France scientific computing complex is a natural candidate to host and organise such a structure. But one thing is sure – all of these projects will only make sense if they are based, like in the United States, Japan and now in China, on a solid local industrial network and a proactive policy of States and European Union.

Interview by Fabienne Lemarchand

*DoE: (Department of Energy) leads the United States’ nuclear deterrent program.
*A petaflop is a million billion operations per second (1015 operations/s).
*The real power of the computer is expressed in sustained teraflops. It is the product of the theoretical power and the yield – the number of operations of a computing code it is able to use. On a parallel machine these yields are of the order of 20% to 25%.
*A gigaflop is a billion operations per second (109 operations/s).
*RTD: Research, Technological Development and Demonstration Activities in the European Union.

[1] Investigation into the frontiers of numerical simulation, Académie des technologies report by the Simulation working group, May 2005 www. irisa.fe/orap/ Publications/ AcaTec-rapport_ Simulation.pdf
[2] V. Croixmarie et al., J. of Structural Biology, 150, 284, 2005.
[3] E. Sartorius and M. Héon La Politique française dans le domaine du calcul scientifique, March 2005. www.recherche. gouv.fr/rapport/ calcul/2005-017. pdf
[4] http://europa. eu.int/comm./ research/future/ index_en.cfm

Source: special issue released on June 2006 adapted from "Le calcul haute performance", which has been distributed in January 2006 with La Recherche n° 393.

 
Contact  |  Site map  |  Legal  |  Privacy